MOD III System Softwares

Case Study-Oracle Mod III

Case Study-OracleMod III

eeee1

History Relational model was introduced in 1970.Research and development effort was initiated at IBMs San Jose Research Center. It led to the announcement of two commercial relational DBMS products by IBM in the 1980s: SQL/DS for DOS/VSE (disk operating system/virtual storage extended) and for VM/CMS (virtual machine/ conversational monitoring system) environments, introduced in 1981; DB2 for the MVS operating system, introduced in 1983. Another relational DBMS, INGRES, was developed at the University of California, Berkeley, in the early 1970s .INGRES became a commercial RDBMS marketed by Ingres, Inc., a subsidiary of ASK, Inc., and is presently marketed by Computer Associates. Other popular commercial RDBMSs include Oracle of Oracle, Inc.; Sybase of Sybase, Inc.; RDB of Digital Equipment Corp, now owned by Compaq; INFORMIX of Informix, Inc.; and UNIFY of Unify, Inc.

Besides the RDBMSs many implementations of the relational data model appeared on the personal computer (PC) platform in the 1980s. These include RIM, RBASE 5000, PARADOX,OS/2 Database Manager, DBase IV, XDB, WATCOM SQL, SQL Server (of Sybase, Inc.), SQL Server (of Microsoft), and most recently Access (also of Microsoft, Inc.). They were initially single-user systems, but more recently they have started offering the client/server database architecture and are becoming compliant with Microsofts Open Database Connectivity (ODBC), a standard that permits the use of many front-end tools with these systemsHistory

Basic Structure of the Oracle System An Oracle server consists of an Oracle databasethe collection of stored data, including log and control files and the Oracle Instancethe processes, including Oracle (system) processes and user processes taken together, created for a specific instance of the database operation. Oracle server supports SQL to define and manipulate data. Has a procedural language,called PL/SQL,to control the flow of SQL, to use variables, and to provide error-handling procedures. Oracle can also be accessed through general purpose programming languages such as C or JAVA

Traditionally, RDBMS vendors have chosen to use their own terminology in describing products in their documentation. 4

Oracle Database Structure The Oracle database has two primary structures: (1) a physical structurereferring to the actual stored data (2) a logical structurecorresponding to an abstract representation of the stored data, which roughly corresponds to the conceptual schema of the database

Oracle databaseContains : One or more data files; these contain the actual data. Two or more log files called redo log ;these record all changes made to data and are used in the process of recovering, if certain changes do not get written to permanent storage. One or more control files; these contain control information such as database name, file names and locations, and a database creation timestamp. This file is also needed for recovery purposes. Trace files and an alert log; background processes have a trace file associated with them and the alert log maintains major database events

log file and control files may be multiplexedthat is, multiple copies may be written to multiple devices. 6

Oracle Instance The set of processes that constitute an instance of the servers operation is called an Oracle Instance, which consists of a System Global Area and a set of background processes. It has the following components: System global area (SGA): This area of memory is used for database information shared by users. Oracle assigns an SGA area when an instance starts. For optimal performance, the SGA is generally made as large as possible, while still fitting in real memory.

SGAThe SGA is divided into several types of memory structures: 1. Database buffer cache: This keeps the most recently accessed data blocks from the database. By keeping most frequently accessed data blocks in this cache, the disk I/O activity can be significantly reduced. 2. Redo log buffer, which is the buffer for the redo log file and is used for recovery purposes. 3. Shared pool, which contains shared memory constructs; these include shared SQL areas, which contain parse trees of SQL queries and execution plans for executing SQL statements.

Oracle InstanceSGAUser processes: Each user process corresponds to the execution of some or some tool. Program global area (PGA) : This is a memory buffer that contains data and control information for a server process. A PGA is created by Oracle when a server process is started. Oracle processes: A process is a "thread of control" or a mechanism in an operating system that can execute a series of steps. A process has its own private memory area where it runs. Oracle processes are divided into server processes and background processes.

Oracle Processes Oracle creates server processes to handle requests from connected user processes .The background processes are created for each instance of Oracle .

Background processes

Database Writer (DBWR): Writes the modified blocks from the buffer cache to the data files on disk Log writer (LGWR): Writes from the log buffer area to the on-line disk log file. Checkpoint (CKPT): Refers to an event at which all modified buffers in the SGA since the last checkpoint are written to the data files . The CKPT process works with DBWR to execute a checkpointing operation. System monitor (SMON): Performs instance recovery, manages storage areas by making the space contiguous, and recovers transactions skipped during recovery. Process monitor (PMON): Performs process recovery when a user process fails. It is also responsible for managing the cache and other resources used by a user process. Archiver (ARCH): Archives on-line log files to archival storage if configured to do so. Recoverer process (RECO): Resolves distributed transactions that are pending due to a network or systems failure in a distributed database. Dispatchers (Dnnn): In multithreaded server configurations, route requests from connected user processes to available shared server processes. There is one dispatcher per standard communication protocol supported. Lock processes (LCKn): Used for inter-instance locking when Oracle runs in a parallel server mode.

Oracle Startup and ShutdownAn Oracle database is not available to users until the Oracle server has been started up and the database has been opened.The following steps need to be taken:Starting an instance of the databaseMounting a database Opening a database The reverse of the above operations will shut down an Oracle instance

Database Structure and Its Manipulation in OracleOracle was designed originally for RDBMS ,but now ORDBMS.The features of Oracle including its relational and object-relational modeling facilities are:Schema Objects for table & view, Synonyms,index,sequenceOracle Data Dictionary- read-only set of tables that keeps the metadata. Users are rarely given access to base tables. SQL in Oracle- Oracle is compliant with the SQL ANSI/ISO standard .

Methods in Oracle 8-operationsTriggers

Storage Organization in Oracle

18StorageEach Database is divided into one or more different tablespaces.As a minimum there is always a System and Users tablespace.One or more data files are created in each tablespace. These data files are associated with only one database.

19Physical StoragePhysical Storage is organized in terms of data blocks, extents and segments. Data blocks are the finest (smallest) size of storage. They are also called logical blocks, page or Oracle blocks.An Extent is a specific number of contiguous data blocks.A Segment is a set of extents for a data structure.

Oracle database contains a tablespace named SYSTEM (to hold the data dictionarys objects)- created automatically19

20Data BlocksA Data Block has the following components:Header: Contains the general block information such as block address & type of segment.Table Directory: Contains information about tables that have data in the data block.Row Directory: Contains information about the actual rows.

21Data Blocks (cont)Row Data: Uses the bulk of the space of the Data Block. A row may span multiple blocks.Free Space: Space allocated for row updates and new rows.

Two space management parameters PCTFREE and PCTUSED enable the DBA/designer to control the use of free space in data blocks 21

22ExtentsThe amount of space initial allocated to an extent is determine by the Create command. Incremental extents are allocated when the initial one becomes full and their size is determined by Create command.All extents allocated to index exist as long as the index exists.

23SegmentsThere are four types of Segments:Data segments: Each non-clustered table and each cluster has a Single data segment.Index segments: Each index has a single index segment.Temporary segments: Used by SQL as a temporary work area.Rollback segments: Used for undoing transactions.

& HASHINGINDEXING

Basic ConceptsIndexing mechanisms are used to speed up access to desired data.Search Key - attribute to set of attributes used to look up records in a file.An index file consists of records (called index entries) of the form

Index files are typically much smaller than the original file Two basic kinds of indices:Ordered indices: search keys are stored in sorted orderHash indices: search keys are distributed uniformly across buckets using a hash function. search-keypointer

25

Index Evaluation MetricsAccess timeInsertion timeDeletion timeSpace overheadAccess types supported efficiently. E.g., records with a specified value in the attributeor records with an attribute value falling in a specified range of values.This strongly influences the choice of index, and depends on usage.

26

Ordered IndicesIn an ordered index, index entries are stored sorted on the search key value. E.g., author catalog in library.Primary index: in a sequentially ordered file, the index whose search key specifies the sequential order of the file.Also called clustering indexThe search key of a primary index is usually but not necessarily the primary key.Secondary index: an index whose search key specifies an order different from the sequential order of the file. Also called non-clustering index.Index-sequential file: ordered sequential file with a primary index.

27

Dense Index FilesDense index Index record appears for every search-key value in the file.

28

Sparse Index FilesSparse Index: contains index records for only some search-key values.Only applicable when records are sequentially ordered on search-keyTo locate a record with search-key value K we:Find index record with largest search-key value < KSearch file sequentially starting at the record to which the index record points

29

Multilevel IndexIf primary index does not fit in memory, access becomes expensive.Solution: treat primary index kept on disk as a sequential file and construct a sparse index on it.outer index a sparse index of primary indexinner index the primary index fileIf even outer index is too large to fit in main memory, yet another level of index can be created, and so on.Indices at all levels must be updated on insertion or deletion from the file.

30

Multilevel Index (Cont.)

31

Index Update: DeletionIf deleted record was the only record in the file with its particular search-key value, the search-key is deleted from the index also.Single-level index deletion:Dense indices deletion of search-key: similar to file record deletion.Sparse indices if an entry for the search key exists in the index, it is deleted by replacing the entry in the index with the next search-key value in the file (in search-key order). If the next search-key value already has an index entry, the entry is deleted instead of being replaced.

32

Index Update: InsertionSingle-level index insertion:Perform a lookup using the search-key value appearing in the record to be inserted.Dense indices if the search-key value does not appear in the index, insert it.Sparse indices if index stores an entry for each block of the file, no change needs to be made to the index unless a new block is created. If a new block is created, the first search-key value appearing in the new block is inserted into the index.Multilevel insertion (as well as deletion) algorithms are simple extensions of the single-level algorithms

33

CIS552Indexing and Hashing34 Non-clustering IndicesFrequently, one wants to find all the records whose values in a certain field satisfy some condition, and the file is not ordered on the field.Example 1: In the account database stored sequentially by account number, we may want to find all accounts in a particular branch.Example 2: As above, but where we want to find all accounts with a specified balance or range of balances.We can have a non-clustering index with an index record for each search-key value. The index record points to a bucket that contains pointers to all the actual records with that particular search-key value.

CIS552Indexing and Hashing35 Secondary Index on balance field of account

CIS552Indexing and Hashing36 Clustering and Non-clusteringNon-clustering indices have to be dense.Indices offer substantial benefits when searching for records.When a file is modified, every index on the file must be updated. Updating indices imposes overhead on database modification.Sequential scan using clustering index is efficient, but a sequential scan using a non-clustering index is expensive each record access may fetch a new block from disk.

Secondary IndicesFrequently, one wants to find all the records whose values in a certain field (which is not the search-key of the primary index) satisfy some condition.Example 1: In the account relation stored sequentially by account number, we may want to find all accounts in a particular branchExample 2: as above, but where we want to find all accounts with a specified balance or range of balancesWe can have a secondary index with an index record for each search-key value

37

Secondary Indices ExampleIndex record points to a bucket that contains pointers to all the actual records with that particular search-key value.Secondary indices have to be dense, since the file is not sorted by the search key.

Secondary index on balance field of account

38

Primary and Secondary IndicesIndices offer substantial benefits when searching for records, but updating indices imposes overhead on database modification - when a file is modified, every index on the file must be updated, Sequential scan using primary index is efficient, but a sequential scan using a secondary index is expensive Each record access may fetch a new block from diskBlock fetch requires about 5 to 10 micro seconds, versus about 100 nanoseconds for memory access

39

Hashing

40

Static HashingA bucket is a unit of storage containing one or more records (a bucket is typically a disk block). In a hash file organization we obtain the bucket of a record directly from its search-key value using a hash function.Hash function h is a function from the set of all search-key values K to the set of all bucket addresses B.Hash function is used to locate records for access, insertion as well as deletion.Records with different search-key values may be mapped to the same bucket; thus entire bucket has to be searched sequentially to locate a record.

41

Example of Hash File Organization Hash file organization of account file, using branch_name as key

42

Example of Hash File OrganizationThere are 10 buckets,The binary representation of the ith character is assumed to be the integer i.The hash function returns the sum of the binary representations of the characters modulo 10E.g. h(Perryridge) = 5 h(Round Hill) = 3 h(Brighton) = 3

Hash file organization of account file, using branch_name as key

43

Hash FunctionsWorst hash function maps all search-key values to the same bucket; this makes access time proportional to the number of search-key values in the file.An ideal hash function is uniform, i.e., each bucket is assigned the same number of search-key values from the set of all possible values.Ideal hash function is random, so each bucket will have the same number of records assigned to it irrespective of the actual distribution of search-key values in the file.Typical hash functions perform computation on the internal binary representation of the search-key. For example, for a string search-key, the binary representations of all the characters in the string could be added and the sum modulo the number of buckets could be returned. .

44

Handling of Bucket OverflowsBucket overflow can occur because of Insufficient buckets Skew in distribution of records. This can occur due to two reasons:multiple records have same search-key valuechosen hash function produces non-uniform distribution of key valuesAlthough the probability of bucket overflow can be reduced, it cannot be eliminated; it is handled by using overflow buckets.

45

Handling of Bucket Overflows (Cont.)Overflow chaining the overflow buckets of a given bucket are chained together in a linked list.

Above scheme is called closed hashing. An alternative, called open hashing, which does not use overflow buckets, is not suitable for database applications.

46

Hash IndicesHashing can be used not only for file organization, but also for index-structure creation. A hash index organizes the search keys, with their associated record pointers, into a hash file structure.

47

Deficiencies of Static HashingIn static hashing, function h maps search-key values to a fixed set of B of bucket addresses. Databases grow or shrink with time. If initial number of buckets is too small, and file grows, performance will degrade due to too much overflows.If space is allocated for anticipated growth, a significant amount of space will be wasted initially (and buckets will be underfull).If database shrinks, again space will be wasted.One solution: periodic re-organization of the file with a new hash functionExpensive, disrupts normal operationsBetter solution: allow the number of buckets to be modified dynamically.

48

Dynamic HashingGood for database that grows and shrinks in sizeAllows the hash function to be modified dynamicallyExtendable hashing one form of dynamic hashing Hash function generates values over a large range typically b-bit integers, with b = 32 (Note that 232 is quite large!)At any time use only a prefix of the hash function to index into a table of bucket addresses.Let the length of the prefix be i bits, 0 i 32. Bucket address table size = 2i. Initially i = 0Value of i grows and shrinks as the size of the database grows and shrinks.Multiple entries in the bucket address table may point to a same bucket. Thus, actual number of buckets is < 2iThe number of buckets also changes dynamically due to coalescing and splitting of buckets.

49

General Extendable Hash Structure In this structure, i2 = i3 = i, whereas i1 = i 1

50

Use of Extendable Hash StructureEach bucket j stores a value ijAll the entries that point to the same bucket have the same values on the first ij bits. To locate the bucket containing search-key Kj:1.Compute h(Kj) = X2.Use the first i high order bits of X as a displacement into bucket address table, and follow the pointer to appropriate bucketTo insert a record with search-key value Kj follow same procedure as look-up and locate the bucket, say j. If there is room in the bucket j insert record in the bucket. Else the bucket must be split and insertion re-attemptedOverflow buckets used instead in some cases

51

Insertion in Extendable Hash Structure (Cont) If i > ij (more than one pointer to bucket j)allocate a new bucket z, and set ij = iz = (ij + 1)Update the second half of the bucket address table entries originally pointing to j, to point to zremove each record in bucket j and reinsert (in j or z)recompute new bucket for Kj and insert record in the bucket (further splitting is required if the bucket is still full)If i = ij (only one pointer to bucket j)If i reaches some limit b, or too many splits have happened in this insertion, create an overflow bucket Elseincrement i and double the size of the bucket address table.replace each entry in the table by two entries that point to the same bucket.recompute new bucket address table entry for KjNow i > ij so use the first case above.

To split a bucket j when inserting record with search-key value Kj:

52

Deletion in Extendable Hash StructureTo delete a key value, locate it in its bucket and remove it. The bucket itself can be removed if it becomes empty (with appropriate updates to the bucket address table). Coalescing of buckets can be done (can coalesce only with a buddy bucket having same value of ij and same ij 1 prefix, if it is present) Decreasing bucket address table size is also possibleNote: decreasing bucket address table size is an expensive operation and should be done only if number of buckets becomes much smaller than the size of the table

53

Use of Extendable Hash Structure: Example Initial Hash structure, bucket size = 2

54

Example (Cont.)Hash structure after insertion of one Brighton and two Downtown records

55

Example (Cont.)Hash structure after insertion of Mianus record

56

Example (Cont.)Hash structure after insertion of three Perryridge records

57

Example (Cont.)Hash structure after insertion of Redwood and Round Hill records

58

Extendable Hashing vs. Other SchemesBenefits of extendable hashing: Hash performance does not degrade with growth of fileMinimal space overheadDisadvantages of extendable hashingExtra level of indirection to find desired recordBucket address table may itself become very big (larger than memory)Cannot allocate very large contiguous areas on disk eitherSolution: B+-tree structure to locate desired record in bucket address tableChanging size of bucket address table is an expensive operation

59

Comparison of Ordered Indexing and HashingCost of periodic re-organizationRelative frequency of insertions and deletionsIs it desirable to optimize average access time at the expense of worst-case access time?Expected type of queries:Hashing is generally better at retrieving records having a specified value of the key.If range queries are common, ordered indices are to be preferredConsider e.g. query with where A v1 and A v2In practice:PostgreSQL supports hash indices, but discourages use due to poor performanceOracle supports static hash organization, but not hash indicesSQLServer supports only B+-trees

60

Index Definition in SQL standardCreate an indexcreate index on ()E.g.: create index b-index on branch(branch_name)Use create unique index to indirectly specify and enforce the condition that the search key is a candidate key is a candidate key.Not really required if SQL unique integrity constraint is supportedTo drop an index drop index Most database systems allow specification of type of index, and clustering.

61

Indexing in OracleOracle supports B+-Tree indices as a default for the create index SQL commandA new non-null attribute row-id is a added to all indices, so as to guarantee that all search keys are unique.indices are supported onattributes, and attribute lists, on results of function over attributesor using structures external to Oracle (Domain indices)Bitmap indices are also supported, but for that an explicit declaration is needed:create bitmap index on ()

62

Hashing in OracleHash indices are not supportedHowever (limited) static hash file organization is supported for partitionscreate table partition by hash() partitions stored in ()

Index files can also be partitioned using hash functioncreate index global partition by hash() partitions This creates a global index partitioned by the hash function

(Global) indexing over hash partitioned table is also possible

Hashing may also be used to organize clusters in multitable clusters

63

Sheet2350400500600700750900

Sheet1BrightonBrightonA-217750DowntownDowntownA-101500MianusDowntownA-110600PerryridgeMianusA-215700RedwoodPerryridgeA-102400Round HillPerryridgeA-201900PerryridgeA-218700RedwoodA-222700Round HillA-305350BrightonMianusRedwood

Sheet2350400500600700750900


Sheet1BrightonBrightonA-217750DowntownDowntownA-101500MianusDowntownA-110600PerryridgeMiamiA-215700RedwoodPerryridgeA-102400Round HillPerryridgeA-201900PerryridgeA-218700RedwoodA-222700Round HillA-305350BrightonMianusRedwood

Sheet2350400500600700750900


Sheet2350400500600700750900


Sheet2350400500600700750900


Sheet2350400500600700750900


Sheet2350400500600700750900


Sheet2350400500600700750900


Documents

MOD III System Softwares