View
217
Download
0
Category
Preview:
Citation preview
Summarization – CS 257
Chapter – 13 Database Systems: The
Complete BookSubmitted by:Nitin Mathur
Submitted to: Dr.T.Y.Lin
What is an Index?
Blocks Holding Records
Index Matching Records
Value
An Index is any Data Structure that takes as input a property of Records - typically the value of one or more fields - and finds the records with that property “quickly”.
Indexes on Sequential Files10
40
20
30
50
60
70
80
This is a Sequential File in which tuples are sorted by their Primary Key.
Definitions to Study Index Structures
•Data File : A Sorted Sequential file
•Index File : Consisting of Key-Pointer pairs
•Search Key : A Search Key K in the Index file is associated with a pointer to a Data file record that has Search Key K.
Dense Indexes
40
30
50
60
70
80
20
10 10
40
20
30
50
60
70
80
There is an entry in the Index File for record in the Data File.
Sparse Indexes
70
50
90
110
130
150
30
10 10
40
20
30
50
60
70
80
Holds only one Key-Pointer per Data Block. The Key is for the first Record on
the Data Block.
Multiple Levels of Indexes
250
170
330
410
490
570
90
10 10
40
20
30
50
60
70
80
70
50
90
110
130
150
30
10
Adding second level of Sparse Index
Managing Indexes during Data Modification
• Create / Delete Overflow blocks
• Insert new blocks in the Sequential order
• Slide tuples to adjacent blocks.
Action Dense Index
Sparse Index
Create empty overflow block None None
Delete empty overflow block None None
Create empty Sequential block
None Insert
Delete empty Sequential block
None Delete
Insert Record Insert Update (?)
Delete Record Delete Update (?)
Slide Record Update Update (?)
How actions on the Sequential File
affect the Index File
Secondary Indexes
• Why do we need Secondary Indexes?
• SELECT name, address
• FROM moviestar
• WHERE birthday = DATE ’01/09/2008’;
Need Secondary Indexes on birthday to help with such queries
• Due advent in WWW keeping documents online and document retrieval become one of the largest database problem
• The most easy approach for document retrieval is to create separate index for each word (Problem: wastage of storage space)
• The other approach is to use Inverted Index
Document Retrieval and Inverted Index
Records is a collection of documents
•The inverted index itself consist of set of word-pointer pairs.•The inverted index pointers refer to position in the bucket file.•Pointers in the bucket file can be:
Pointers to documentPointers to occurrence of word (may be pair of first block
of document and an integer indicating number of word)When points to word we can include some info in bucket
array EX. For document using HTML and XML we can also
include marking associated with words so we can
distinguish between titles headers etc.
Inverted Index
Stemming: Remove suffixes to find stem of each word ( Ex. Plurals can be treated as there singular version.
Stop Words: words such as “the” or “and” are excluded from inverted index.
Ex. With ref. to next fig. if we want to find the document about the dogs that compare them with cats. • Difficult to solve with out understanding of text• However we could get good hint if we search
document that
1. Mention dogs in the title, and
2. Mention cats in an anchor <href>
More Information Retrieval Techniques to improve
effectiveness
The most commonly used index structure in the commercial systems.
Advantages• B-trees automatically maintains the levels of index according to file size• B-trees mange the space on the blocks so no overflow blocks are needed
The structure of B-trees• The tree is balanced ( All the paths from root to leaf have the same length• Typically three layers: the root, an intermediate layer, and leaves.
B-Trees
Keys are distributed among the leaves in sorted order, from left to right
• At root there are at least two used pointer. (exception of tree with single record)• At leaf last pointer points to the next leaf block to the right.• At interior node all n+1 pointer can be used (at least n+1/2 are actually used)
Important rules about what Important rules about what can appear in the blocks of a can appear in the blocks of a
B-treeB-tree
B-tree allow lookup, insertion, deletion of record using few disk I/O’s
1. If the number of keys per block is reasonably large then rarely we need to split or merge the blocks. And even if we need this operation this are limited to the leaves and there parents.
2. The number of disk I/O to read the record are normally the levels of B-tree plus the one (for lookup) and two (for insert or delete).
Ex. Suppose 340 key pointer pairs could fit in one block, suppose avg. block has occupied between min. and max. i.e. the typical block has 255 pointers.
• With root 255 children and 255^2 = 65025 leaves• Suppose among this leaves we have 255^3 or about 16.6
million records• That is, files with up to 16.6 million records can be
accommodated by 3 levels of B-tree• Number of disk I/O can reduced further by keeping B-tree in
main memory.
EfficiencyEfficiency ofof B-treeB-tree
IndexIndexDisk Failures
Intermittent Failures
Checksums
Stable Storage
Error- Handling Capabilities of Stable Storage
TypesTypes of Errorsof Errors
Intermittent Error: Read or write is unsuccessful.
Media Decay: Bit or bits becomes permanently corrupted.
Write Failure: Neither write or retrieve the data.
Disk Crash: Entire disk becomes unreadable.
IntermittentIntermittent Failures Failures
If we try to read the sector but the correct content of that sector is not delivered to the disk controller
Check for the good or bad sector To check write is correct: Read is performed Good sector and bad sector is known by the
read operation
ChecksumsChecksums Each sector has some additional bits, called the
checksums Checksums are set on the depending on the values
of the data bits stored in that sector Probability of reading bad sector is less if we use
checksums For Odd parity: Odd number of 1’s, add a parity bit 1 For Even parity: Even number of 1’s, add a parity bit
0 So, number of 1’s becomes always even
Example: 1. Sequence : 01101000-> odd no of 1’s
parity bit: 1 -> 011010001 2. Sequence : 111011100->even no of
1’sparity bit: 0 -> 111011100
By finding one bit error in reading and writing the bits and their parity bit results in sequence of bits that has odd parity, so the error can be detected
Error detecting can be improved by keeping one bit for each byte
Probability is 50% that any one parity bit will detect an error, and chance that none of the eight do so is only one in 2^8 or 1/256
Same way if n independent bits are used then the probability is only 1/(2^n) of missing error
Stable StorageStable Storage To recover the disk failure known as Media Decay,
in which if we overwrite a file, the new data is not read correctly
Sectors are paired and each pair is said to be X, having left and right copies as Xl and Xr respectively and check the parity bit of left and right by substituting spare sector of Xl and Xr until the good value is returned
Error Handling Capabilities of Error Handling Capabilities of Stable StorageStable Storage
Failures: If out of Xl and Xr, one fails, it can be read form other, but in case both fails X is not readable, and its probability is very small
Write Failure: During power outage, 1. While writing Xl, the Xr, will remain good and X can be read from Xr2. After writing Xl, we can read X from Xl, as Xr may or may not have the correct copy of X
Recovery from Disk Crashes: Recovery from Disk Crashes: Ways to recover dataWays to recover data
The most serious mode of failure for disks is “head crash” where data permanently destroyed.
So to reduce the risk of data loss by disk crashes there are number of schemes which are know as RAID (Redundant Arrays of Independent Disks) schemes.
Recovery from Disk CrashesRecovery from Disk CrashesWays to recover dataWays to recover data
Each of the schemes starts with one or more
disks that hold the data and adding one or
more disks that hold information that is
completely determined by the contents of
the data disks called Redundant Disk.
Mirroring as a Redundancy Mirroring as a Redundancy TechniqueTechnique
Mirroring Scheme is referred as RAID level 1 protection against data loss scheme.
In this scheme we mirror each disk.
One of the disk is called as data disk and other redundant disk.
In this case the only way data can be lost is if there is a second disk crash while the first crash is being repaired.
Parity Blocks
RAID level 4 scheme uses only one redundant disk no matter how many data disks there are.
In the redundant disk, the ith block consists of the parity checks for the ith blocks of all the data disks.
It means, the jth bits of all the ith blocks of both data disks and redundant disks, must have an even number of 1’s and redundant disk bit is used to make this condition true.
Parity Blocks – Reading disk
Reading data disk is same as reading block from any disk.
We could read block from each of the other disks and compute the block of the disk we want to read by taking the modulo-2 sum.
disk 2: 10101010disk 3: 00111000disk 4: 01100010
If we take the modulo-2 sum of the bits in each column, we get
disk 1: 11110000
Parity Block - Writing
When we write a new block of a data disk, we need to change that block of the redundant disk as well.
One approach to do this is to read all the disks and compute the module-2 sum and write to the redundant disk.
But this approach requires n-1 reads of data, write a data block and write of redundant disk block.
Total = n+1 disk I/Os
Parity Block - Writing
Better approach will require only four disk I/Os
1. Read the old value of the data block being changed.
2. Read the corresponding block of the redundant disk.
3. Write the new data block.
4. Recalculate and write the block of the redundant disk.
Parity Blocks – Failure Recovery
If any of the data disk crashes then we just have to compute the module-2 sum to recover the disk.
Suppose that disk 2 fails. We need to re compute each block of the replacement disk. We are given the corresponding blocks of the first and third data disks and the redundant disk, so the situation looks like:
disk 1: 11110000disk 2: ????????disk 3: 00111000disk 4: 01100010If we take the modulo-2 sum of each column, we deduce
that the missing block of disk 2 is : 10101010
An Improvement: RAID 5
RAID 4 is effective in preserving data unless there are two simultaneous disk crashes.
Whatever scheme we use for updating the disks, we need to read and write the redundant disk's block. If there are n data disks, then the number of disk writes to the redundant disk will be n times the average number of writes to any one data disk.
However we do not have to treat one disk as the redundant disk and the others as data disks. Rather, we could treat each disk as the redundant disk for some of the blocks. This improvement is often called RAID level 5.
Continued…
For instance, if there are n + 1 disks numbered 0 through n, we could treat the ith cylinder of disk j as redundant if j is the remainder when i is divided by n+1.
For example, n = 3 so there are 4 disks. The first disk, numbered 0, is redundant for its cylinders numbered 4,8, 12, and so on, because these are the numbers that leave remainder 0 when divided by 4.
The disk numbered 1 is redundant for blocks numbered 1, 5, 9, and so on; disk 2 is redundant for blocks 2, 6. 10,. . ., and disk 3 is redundant for 3, 7, 11,. . . .
Coping With Multiple Disk Crashes
• Error-correcting codes theory known as Hamming code leads to the RAID level 6.
• By this strategy the two simultaneous crashes are correctable.
The bits of disk 5 are the modulo-2 sum of the corresponding bits of disks 1, 2, and 3.
The bits of disk 6 are the modulo-2 sum of the corresponding bits of disks 1, 2, and 4.
The bits of disk 7 are the module2 sum of the corresponding bits of disks 1, 3, and 4
Coping With Multiple Disk Crashes – Reading/Writing
We may read data from any data disk normally.
To write a block of some data disk, we compute the modulo-2 sum of the new and old versions of that block. These bits are then added, in a modulo-2 sum, to the corresponding blocks of all those redundant disks that have 1 in a row in which the written disk also has 1.
Records consist of fields. Each record must have a schema
which is stored by database system. The schema includes the name and
data types of the fields and their offsets within the record.
RECORDS
Example:
CREATE TABLE Moviestar (name CHAR(30) PRIMARY KEY,address VARCHAR(255) ,gender CHAR(1) ,birthdate DATE );
Fixed Length Records
Name Address Gender Birth date
30 286 287 297
Fixed Length Records
Each record start at a byte within its block that is a multiple of 4.
All fields within the record start at a byte that is offset from the beginning of the record by a multiple of 4.
Fixed Length Records
So the record should look like this.
Name Address Gender Birth date
32 288 292 304
Fixed Length Records
Following information should be there in the record.1. The record schema2. The length of the record3. Timestamps
Many record layouts include a header of some small number of bytes to provide this additional information.
Record Headers
To schema Timestamp
Name Address gender birthdate
0 12 44 300 304 316 length
Record Headers
Records representing tuples of a relation are stored in blocks of the disk and moved into main memory when we need to access or update them.
Header record1 record2 … record n
Records into Blocks
Header contains following information. Links to one or more other blocks that are part of a
network of blocks for creating indexes to the tuples of a relation.
Information about the role played by this block in such a network.
Information about which relation the tuples of this block belong to.
Timestamps indicating the time of the block's last modification or access.
Records into Blocks
Database consists of a server process that provides data from secondary storage to one or more client processes that are applications using the data.
The server and client processes may be on one machine, or the server and the various clients can be distributed over many machines.
Client - Server Systems
The client application uses a "virtual" address space.
The operating system or DBMS decides which parts of the address space are currently located in main memory, and hardware maps the virtual address space to physical locations in main memory.
Client - Server Systems
The server's data lives in a database address space.
The addresses of this space refer to blocks, and possibly to offsets within the block.
Client - Server Systems
These are byte strings that let us determine the place within the secondary storage system where the block or record can be found.
Bytes of physical address used to indicate following information:
The host to which the storage is attached.
An identifier for the disk or other device on which the block is located.
Client - Server Systems – Physical Address
The number of the cylinder of the disk.
The number of the track within the
cylinder.
The number of the block within the
track.
The offset of the beginning of the record
within the block.
Client - Server Systems – Physical Address
Each block or record has a "logical address," which is an arbitrary string of bytes of some fixed length.
A map table, stored on disk in a known location, relates logical to physical addresses.
Client - Server Systems Logical Address
Map table : logical physical
Logical address
Physical address
Client - Server Systems Logical Address
All the information needed for a physical address is found in the map table.
Many combinations of logical and physical addresses yield structured address schemes.
A very useful, combination of physical and logical addresses is to keep in each block an offset table that holds the offsets of the records within the block, as suggested in Fig .
Logical and Structured Addresses
Record4 Record3 Record2 Record1
Header
Unused
Offset value
A block with a table of offsets telling us the
position of each record within the block
Record Headers
The address of a record is now the physical address of its block plus the offset of the entry in the block's offset table for that record
ADVANTAGES Move the record around within the block We can even allow the record to move to
another block Finally, we have an option, should the record be
deleted, of leaving in its offset-table entry a tombstone, a special value that indicates the record has been deleted.
relational systems need the ability to represent pointers in tuples
index structures are composed of blocks that usually have pointers within them
Thus, we need to study the management of pointers as blocks are moved between main and secondary memory.
Pointer Swizzling
every block, record, object, or other referenceable data item has two forms of address:
database address the memory address of the item. in secondary storage, we must use the database
address of the item. However, when the item is in the main memory,
we can refer to the item by either its database address or its memory address.
We need a table that translates from all
those database addresses that are
currently in virtual memory to their
current memory address. Such a
translation table is suggested in Fig.
DB-addr mem-addrDatabase address
memory address
The translation table turns database addresses into their equivalents in memory
To avoid the cost of translating repeatedly from database addresses to memory addresses, several techniques have been developed that are collectively known as pointer swizzling.
when we move a block from secondary to main memory, pointers within the block may be “swizzled,"that is, translated from the database address space to the virtual address space.
A pointer actually consists of:
1. A bit indicating whether the pointer is
currently a database address or a (swizzled) memory address.
2.The database or memory pointer, as appropriate.
As soon as a block is brought into memory, we locate dl its pointers and addresses and enter them into the translation table if they are not already there.
However we need some mechanism to locate the pointers.
Automatic Swizzling
For example:
1. If the block holds records with a known schema, the schema will tell us where in the records the pointers are found.
2. If the block is used for one of the index structures then the block will hold pointers at known locations.
3. We may keep within the block header a list of where the pointers are.
Disk Memory
Block1
Block2
Swizzled
Unswizzled
Read into Memory
Structure of a pointer when swizzling is used
leave all pointers unswizzled when the block is first brought into memory.
We enter its address, and the addresses of its pointers, into the translation table, along with their memory equivalents.
If and when we follow a pointer P that is inside some block of memory, we swizzle it.
difference between on-demand and automatic swizzling is that the latter tries to get all the pointers swizzled quickly and efficiently when the block is loaded into memory.
Swizzling on Demand
The possible time saved by swizzling all of a block‘s pointers at one time must be weighed against the possibility that some swizzled pointers will never be followed.
In that case, any time spent swizzling and unswizzling the pointer will be wasted.
Drawback
arrange that database pointers look like invalid memory addresses. If so, then we can allow the
computer to follow any pointer as if it were in its memory form.
If the pointer happens to be unswizzled, then the memory reference will cause a hardware
trap. If the DBMS provides a function that is invoked by
the trap, and this function "swizzles" the pointer and then we can follow swizzled pointers in single instructions, and only need to do something more time consuming when the pointer is unswizzled.
Option
it is possible never to swizzle pointers.
We still need the translation table, so the pointers may be followed in their unswizzled form.
No Swizzling
it may be known by the application programmer whether the pointers in a block are likely to be followed.
This programmer may be able to specify explicitly that a block loaded into memory is to have its pointers swizzled, or the programmer may call for the pointers to be swizzled only as needed.
Programmer Control of Swizzling
When a block is moved from memory back to disk, any pointers within that block must be "unswizzled“.
The translation table can be used to associate addresses of the two types in either direction
However, we do not want each unswizzling operation to require a search of the entire translation table.
Returning Blocks to Disk
If we think of the translation table as a relation, then the problem of findingm the memory address associated with a database address x can be expressed as the query:
SELECT memAddr FROM TranslationTable
WHERE dbAddr = x;
If we want to support the reverse query, then we need to have an index on attribute memAddr as well.
SELECT dbAddr FROM TranslationTable WHERE memAddr = y;
A block in memory is said to be pinned if it cannot at the moment be written back to disk safely.
A bit telling whether or not a block is pinned can be located in the header of the block.
Pinned Records and Blocks
If a block B1 has within it a swizzled pointer to some data item in block B2.
we follow the pointer in B1,it will lead us to the buffer, which no longer holds B2; in effect, the pointer has become dangling.
A block, like B2, that is referred to by a swizzled pointer from somewhere else is therefore pinned
Reason for block to be pinned
If it is pinned, we must either unpin it, or let the block remain in memory, occupying space that could otherwise be used for some other block.
To unpin a block that is pinned because of swizzled pointers from outside, we must "unswizzle” any pointers to it.
Consequently, the translation table must record, for each database address whose data item is in memory, the places in memory where swizzled pointers to that item exist.
Two possible approaches are:
1. Keep the list of references to a memory address as a linked list attached to the entry for that address in the translation table.
2. If memory addresses are significantly shorter than database addresses, we can create the linked list in the space used for the pointers themselves.
That is, each space used for a database pointer is replaced by
(a) The swizzled pointer, and (b) Another pointer that forms part of a linked
list of all occurrences of this pointer.
x y
y
y
A linked list of occurrences of a swizzled pointer
Swizzled pointer
Records With Variable-Length Fields
A simple but effective scheme is to put all fixed length fields ahead of the variable-length fields. We then place in the record header:
1. The length of the record.2. Pointers to (i.e., offsets of) the beginnings of all
the variable-length fields. However, if the variable-length fields always appear in the same order then the first of them needs no pointer; we know it immediately follows the fixed-length fields.
Records With Repeating Fields
A similar situation occurs if a record contains a variablenumber of Occurrences of a field F, but the field itself is offixed length. It is sufficient to group all occurrences of field Ftogether and put in the record header a pointer to the first. We can locate all the occurrences of the field F as follows.Let the number of bytes devoted to one instance of field F beL. We then add to the offset for the field F all integermultiples of L, starting at 0, then L, 2L, 3L, and so on. Eventually, we reach the offset of the field following F.Where upon we stop.
An alternative representation is to keep the record of fixed length, and put the variable length portion - be it fields of variable length or fields that repeat an indefinite number of times - on a separate block. In the record itself we keep:
1. Pointers to the place where each repeating field begins, and
2. Either how many repetitions there are, or where the repetitions end.
Storing variable-length fields separately from the record
Variable-Format Records
The simplest representation of variable-format records is a sequence of tagged fields, each of which consists of:
1. Information about the role of this field, such as:(a) The attribute or field name,(b) The type of the field, if it is not apparent from the field name and some readily available schema information, and(c) The length of the field, if it is not apparent from the type.
2. The value of the field.
There are at least two reasons why tagged fields would make sense.
1. Information integration applications - Sometimes, a relation has been constructed from several earlier sources, and these sources have different kinds of information For instance, our movie star information may have come from several sources, one of which records birthdates, some give addresses, others not, and so on. If there are not too many fields, we are probably best off leaving NULL those values we do not know.
2. Records with a very flexible schema - If many fields of a record can repeat and/or not appear at all, then even if we know the schema, tagged fields may be useful. For instance, medical records may contain information about many tests, but there are thousands of possible tests, and each patient has results for relatively few of them.
A record with tagged fields
Records That Do Not Fit in a Block
These large values have a variable length, but even if the length is fixed for all values of the type, we need to use some special techniques to represent these values. In this section we shall consider a technique called “spanned records" that can be used to manage records that are larger than blocks.
Spanned records also are useful in situations where records are smaller than blocks, but packing whole records into blocks wastes significant amounts of space.For both these reasons, it is sometimes desirable to allow records to be split across two or more blocks. The portion of a record that appears in one block is called a record fragment.
If records can be spanned, then every record and record fragment requires some extra header information:
1. Each record or fragment header must contain a bit telling whether or not it is a fragment.
2. If it is a fragment, then it needs bits telling whether it is the first or last fragment for its record.
3. If there is a next and/or previous fragment for the same record, then the fragment needs pointers to these other fragments.
Storing spanned records across blocks
BLOBS
Binary, Large OBjectS = BLOBS BLOBS can be images, movies, audio files and other
very large values that can be stored in files. Storing BLOBS
Stored in several blocks.– Preferable to store them consecutively on a
cylinder or multiple disks for efficient retrieval. Retrieving BLOBS
– A client retrieving a 2 hour movie may not want it all at the same time.
– Retrieving a specific part of the large data requires an index structure to make it efficient. (Example: An index by seconds on a movie BLOB.)
Column Stores
An alternative to storing tuples as records is to store each column as a record. Since an entire column of a relation may occupy far more than a single block, these records may span many block, much as long as files do. If we keep the values in each column in the same order then we can reconstruct the relation from column records
Consider this relation
INTRODUCTION
What is Record ?Record is a single, implicitly structured data item in the database table. Record is also called as Tuple.
What is definition of Record Modification ?
We say Records Modified when a data manipulation operation is performed.
STRUCTURE OF A RECORD
RECORD STRUCTURE FOR A PERSON TABLE
CREATE TABLE PERSON ( NAME CHAR(30), ADDRESS CHAR(256) , GENDER CHAR(1), BIRTHDATE CHAR(10));
TYPES OF RECORDS
FIXED LENGTH RECORDS
CREATE TABLE SJSUSTUDENT(STUDENT_ID
INT(9) NOT NULL , PHONE_NO INT(10) NOT NULL);
VARIABLE LENGTH RECORDS
CREATE TABLE SJSUSTUDENT(STUDENT_ID INT(9) NOT NULL,NAME CHAR(100) ,ADDRESS CHAR(100) ,PHONE_NO INT(10) NOT NULL);
RECORD MODIFICATION
Modification of Record Insert Update Delete Issues even with Fixed Length
Records More Issues with Variable Length
Records
STRUCTURE OF A BLOCK & RECORDS
Various Records are clubbed together and stored together in memory in blocks
STRUCTURE OF BLOCK
BLOCKS & RECORDS
If records need not be any particular order, then just find a block with enough empty space.
We keep track of all records/tuples in a relation/tables using Index structures, File organization concepts
Inserting New Records
If Records are not required to be a particular order, just find an empty block and place the record in the block.
eg: Heap Files What if the Records are to be Kept in a
particular Order(eg: sorted by primary key) ? Locate appropriate block,check if space is
available in the block if yes place the record in the block.
INSERTING NEW RECORDS
We may have to slide the Records in the Block to place the Record at an appropriate place in the Block and suitably edit the block header.
What If The Block Is Full ?
We need to Keep the record in a particular block but the block is full. How do we deal with it ?
We find room outside the Block There are 2 approaches to finding the
room for the record.1. Find Space on Nearby Block2. Create an Overflow Block
Approaches to finding room for record
Find space on nearby block Block b1 has no space If space available on block b2 move
records of b1 to b2. If there are external pointers to
records of b1 moved to b2 leave forwarding address in offset table of b1
Approaches to finding room for record
Create overflow blockEach block b has in its header pointer to an overflow block where additional blocks of b can be placed.
Deletion
Try to reclaim the space available on a record after deletion of a particular record
If an offset table is used for storing information about records for the block then rearrange/slide the remaining records.
If Sliding of records is not possible then maintain a SPACE-AVAILABLE LIST to keep track of space available on the Record.
Tombstone
What about pointer to deleted records ? A tombstone is placed in place of each deleted
record A tombstone is a bit placed at first byte of
deleted record to indicate the record was deleted ( 0 – Not Deleted 1 – Deleted)
A tombstone is permanent
Updating Records
For Fixed-Length Records, there is no effect on the storage system
For variable length records : If length increases, like insertion
“slide the records” If length decreases, like deletion we
update the space-available list, recover the space/eliminate the overflow blocks.
Secondary Storage Management
Database systems always involve secondary storage like the disks and other devices that store large amount of data that persists over time.
The Memory Hierarchy
A typical computer system has several different components in which data may be stored.
These components have data capacities ranging over at least seven orders of magnitude and also have access speeds ranging over seven or more orders of magnitude.
The Memory hierarchy from the text book as follows:
Cache
It is the lowest level of the hierarchy is a cache. Cache is found on the same chip as the microprocessor itself, and additional level-2 cache is found on another chip.
Data and instructions are moved to cache from main memory when they are needed by the processor.
Cache data can be accessed by the processor in a few nanoseconds.
Main Memory
In the center of the action is the computer's main memory. We may think of everything that happens in the computer - instruction executions and data manipulations - as working on information that is resident in main memory
Typical times to access data from main memory to the processor or cache are in the 10-100 nanosecond range
Secondary Storage
Essentially every computer has some sort of secondary storage, which is a form of storage that is both significantly slower and significantly more capacious than main memory.
The time to transfer a single byte between disk and main memory is around 10 milliseconds.
Tertiary Storage
As capacious as a collection of disk units can be, there are databases much larger than what can be stored on the disk(s) of a single machine, or even of a substantial collection of machines.
Tertiary storage devices have been developed to hold data volumes measured in terabytes.
Tertiary storage is characterized by significantly
higher read/write times than secondary storage,
but also by much larger capacities and smaller
cost per byte than is available from magnetic
disks.
Transfer of Data Between Levels
Normally, data moves between adjacent levels of the hierarchy.
At the secondary and tertiary levels, accessing the desired data or finding the desired place to store data takes a great deal of time, so each level is organized to transfer large amount of data or from the level below, whenever any data at all is needed.
The disk is organized into disk blocks and the entire blocks are moved to or from a continuous section of main memory called a buffer.
Volatile and Nonvolatile Storage
A volatile device "forgets" what is stored in it when the power goes off.
A nonvolatile device, on the other hand, is expected to keep its contents intact even for long periods when the device is turned off or there is a power failure.
Magnetic and optical materials hold their data in the absence of power.
Thus, essentially all secondary and tertiary storage devices are nonvolatile. On the other hand main memory is generally volatile.
Virtual Memory
When we write programs the data we use, variables of the program, files read and so on occupies a virtual memory address space.
Many machines use a 32-bit address space; that is, there are 2(pow)32 bytes or 4 gigabytes.
The Operating System manages virtual memory, keeping some of it in main memory and the rest on disk.Transfer between memory and disk is in units of disk blocks.
Disks
The use of secondary storage is one of
the important characteristics of a DBMS,
and secondary storage is almost
exclusively based on magnetic disks
Mechanics of Disks
The two principal moving pieces of a disk drive are a disk assembly and a head assembly.
The disk assembly consists of one or more circular platters that rotate around a central spindle
The upper and lower surfaces of the platters are covered with a thin layer of magnetic material, on which bits are stored.
A typical disk format from the text book is shown as below:
0’s and 1’s are represented by different patterns in the magnetic material.A common diameter for the disk platters is 3.5 inches. The disk is organized into tracks, which are concentric circles on a single platter.The tracks that are at a fixed radius from a center, among all the surfaces form one cylinder.
Top View of a disk surface from the text is as shown below
Tracks are organized into sectors, which are segments of the circle separated by gaps that are magnetized to represent either 0’s or 1’s. The second movable piece the head assembly, holds the disk heads.
The Disk Controller
One or more disk drives are controlled by a disk controller, which is a small processor capable of:
Controlling the mechanical actuator that moves the head assembly to position the heads at a particular radius.
Transferring bits between the desired sector and the main memory.
Selecting a surface from which to read or write, and selecting a sector from the track on that surface that is under the head.An example of single processor is shown in next slide.
Simple computer system from the text is shown below
Disk Access Characteristics:
Seek Time: The disk controller positions the head assembly at the cylinder containing the track on which the block is located. The time to do so is the seek time.
Rotational Latency: The disk controller waits while the first sector of the block moves under the head. This time is called the rotational latency.
Transfer Time: All the sectors and the gaps between them pass under the head, while the disk controller reads or writes data in these sectors. This delay is called the transfer time.The sum of the seek time, rotational latency, transfer time is the latency of the time.
B-Trees
B-tree organizes its blocks into a tree. The tree is balanced, meaning that all paths from the root to a leaf have the same length. Typically, there are three layers in a B-tree: the root, an intermediate layer, and leaves, but any number of layers is possible.
Functionalities of B- Tree
• B-Trees automatically maintain as many levels of index as is appropriate for the size of the file being indexed.
• B-Trees manage the space on the blocks they use so that every block is between half used and completely full. No overflow blocks are needed.
Structure of B-Trees
There are three layers in binary trees- the root, an intermediate layer and leaves
In a B-Tree each block have space for n search-key values and n+1 pointers
[next slide explains the structure of a B-Tree]
Root
B-Tree Example n=3
100
120
150
180
30
3 5 11 30 35 100
101
110
120
130
150
156
179
180
200
Sample non-leaf
to keys to keys to keys to keys to keys
< 57 57 k<81 81k<95 95
57 81 95
From non-leaf node
to next leafin sequence57 81 95
To re
cord
w
ith k
ey 5
7
To re
cord
w
ith k
ey 8
1
To re
cord
w
ith k
ey 8
5
Sample leaf node
In textbook’s notationn=3
Leaf:
Non-leaf:
30 3530
30 35
30
Size of nodes: n+1 pointersn keys (fixed)
Don’t want nodes to be too empty
Use at least
Non-leaf: (n+1)/2 pointers
Leaf: (n+1)/2 pointers to data
Full node min. node
Non-leaf
Leaf
n=3
120
150
180
30
3 5 11 30 35
coun
ts e
ven
if nu
ll
B-tree rules tree of order n
(1) All leaves at same lowest level(balanced tree)
(2) Pointers in leaves point to recordsexcept for “sequence pointer”
Non-leaf(non-root) n+1 n (n+1)/2 (n+1)/2- 1
Leaf(non-root) n+1 n
Root n+1 n 1 1
Max Max Min Min ptrs keys ptrsdata keys
(n+1)/2 (n+1)/2
Applications of B-trees
The search key of the B-tree is the primary key for the data file, and the index is dense. That is, there is one key-pointer pair in a leaf for every record of the data file. The data file may or may not be sorted by primary key.
2. The data file is sorted by its primary key, and the B-tree is a sparse index with one key-pointer pair at a leaf for each block of the data file.
3. The data file is sorted by an attribute that is not a key, and this attribute is the search key for the B-tree. For each key value K that appears in the data file there is one key-pointer pair at a leaf. That pointer goes to the first of the records that have K as their sort-key value.
Lookup in B-Trees
Suppose we want to find a record with search key 40. We will start at the root , the root is 13, so the record will
go the right of the tree. Then keep searching with the same concept.
Looking for block “40”<not present>
13
317
312923191713117532
43
4137 4743
23
Range Queries
B-trees are used for queries in which a range of values are asked for. Like,
SELECT * FROM R WHERE R. k >= 10 AND R. k <= 25;
Insert into B-tree
(a) simple case– space available in leaf
(b) leaf overflow(c) non-leaf overflow(d) new root
(a) Insert key = 32 n=33 5 11 30 31
30
100
32
(a) Insert key = 7 n=3
3 5 11 30 31
30
100
3 5
7
7
(c) Insert key = 160 n=310
0
120
150
180
150
156
179
180
200
160
180
160
179
(d) New root, insert 45 n=3
10 20 30
1 2 3 10 12 20 25 30 32 40 40 45
40
30new root
CS 245 Notes 4 148
(a) Simple case - no example
(b) Coalesce with neighbor (sibling)
(c) Re-distribute keys
(d) Cases (b) or (c) at non-leaf
Deletion from B-tree
(b) Coalesce with sibling Delete 50
10 40 100
10 20 30 40 50
n=4
40
(c) Redistribute keys Delete 50
10 40 100
10 20 30 35 40 50
n=4
35
35
40 4530 3725 2620 2210 141 3
10 20 30 40
(d) Non-leaf coalese– Delete 37
n=4
40
30
25
25
new root
B-tree deletions in practice
– Often, coalescing is not implemented– Too hard and not worth it!
Why we take 3 as the number of levels of a B-tree?
Suppose our blocks are 4096 bytes. Also let keys be integers of 4 bytes and let pointers be 8 bytes. If there is no header information kept on the blocks, then we want to find the largest integer value of n such that -
411 + 8(n + 1) 5 4096. That value is n = 340. 340 key-pointer pairs could fit in one block for our example data. Suppose that the average block has an occupancy midway between the minimum and maximum. i.e.. a typical block has 255 pointers. With a root 255 children and 255*255= 65023 leaves. We shall have among those leaves cube of 253. or about 16.6 million pointers to records. That is, files with up to 16.6 million records can be accommodated by a 3-level B-tree.
Recommended