89
Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Embed Size (px)

Citation preview

Page 1: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 1

Week 11Week 11

Improving Database Performance

Page 2: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 2

Improving Database PerformanceImproving Database Performance

So far, we have looked at many aspects of designing, creating, populating and querying a database. We have (briefly) explored ‘optimisation’ which is used to ensure that query execution time is minimised

In this lecture we are going to look at some techniques which are used to improve performance and availability

WHY ?

Because databases are required to be available, in many installations and applications, 24 hours a day, 7 days a week, 52 weeks every year - think of the ‘user’ demands in e-business

Page 3: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 3

Improving Database PerformanceImproving Database Performance

There are many ‘solutions’ - including

parallel processors

faster processors

higher speed communications

more memory

faster disks

more disk units on line

higher capacity disks

any others ?

Page 4: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 4

Improving Database PerformanceImproving Database Performance

We are going to look at a technique called ‘clustering’ - an architecture for improving ‘power’ and availability

What are the ‘dangers’ to non-stop availability

Try these :-– System outages (planned)

» Maintenance, tuning– System outages (unplanned)

» hardware failure, bugs, virus attacks

Page 5: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 5

Improving Database PerformanceImproving Database Performance

E-business is not the only focus

Businesses are tending to be ‘global organisations’ - remember one of the early lectures ?

So what is one of the solutions’ ?

In a single word - clustering

Clustering is based on the premise that multiple processors can provide better, faster and more reliable processing than a single computer

Page 6: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 6

Improving Database PerformanceImproving Database Performance

However, as in most ‘simple’ solutions in Information Technology, the problem is in the details

How can clustering be achieved ?

Which technologies and architectures off the best approach to clustering ?

– and, what is the measure, or metric, of ‘best’ ?

Page 7: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 7

Improving Database PerformanceImproving Database Performance

What are some of the advantages of clustering ?– Improved availability of services– Scalability

Clustering involves multiple and independent computing systems which work together as one

When one of the independent systems fails, the cluster software can distribute work from the failing system to the remaining systems in the cluster

Page 8: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 8

Improving Database PerformanceImproving Database Performance

‘Users’ normally would not notice the difference

– They interact with a cluster as if it were a single server - and importantly the resources they require will still be available

Clustering can provide high levels of availability

Page 9: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 9

Improving Database PerformanceImproving Database Performance

What about ‘scalability’ ?

Loads will (sometimes) exceed the capabilities which make up the cluster.

Additional facilities can be incrementally added to increase

the cluster’s computing power and ensure processing requirements are met

As transaction and processing loads become established, the cluster (or parts of it) can be increased in size or number

Page 10: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 10

Improving Database PerformanceImproving Database Performance

Clustering is NOT a ‘new’ concept

A company named DEC introduced them for VMS systems in the early 1980’s - about 20 years ago

Which firms offer clustering packages now ?

IBM, Microsoft and Sun Microsystems

Page 11: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 11

Improving Database PerformanceImproving Database Performance

What are the different types of Clustering ?

There are 2 architectures;– Shared nothing and– Shared disk

– In a shared nothing architecture, each system has its own private memory and one or more disks

And each server in the cluster has its own independent subset of the data it can work on independently without meeting resource contention from other servers

Page 12: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 12

Improving Database PerformanceImproving Database Performance

This might explain better:-

CPU 1 CPU 2 CPU 3

Memory 1 Memory 2 Memory ..n

A Shared Nothing Architecture

Interconnectionnetwork

Page 13: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 13

Improving Database PerformanceImproving Database Performance

As you saw on the previous overhead, a shared nothing environment, each system has its own ‘private memory’ and one or more disks

And each server in the cluster has its own independent subset of the data it can work on without meeting resource conflicts from other servers

The clustered processors communicate by passing messages through a network which interconnects the computers

Page 14: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 14

Improving Database PerformanceImproving Database Performance

Client requests are automatically directed to the system which owns the particular resource

Only one of the clustered systems can ‘own’ and access a particular resource at a time.

When a failure occurs, resource ownership can be dynamically transferred to another system in the cluster

Theoretically, a shared nothing multiprocessor could scale up to thousands of processors - the processors don’t interfere with one another - no resources are shared

Page 15: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 15

Improving Database PerformanceImproving Database Performance

CPU 1 CPU 2 CPU …n

Memory 1 Memory 2 Memory ..n

A Shared All Environment

InterconnectingNetwork

Page 16: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 16

Improving Database PerformanceImproving Database Performance

In a ‘shared all’ environment, you noticed that all of the connected systems shared the same disk devices

Each processor has its own private memory, but all the processors can directly access all the disks

In addition, each server has access to all the data

Page 17: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 17

Improving Database PerformanceImproving Database Performance

In this arrangement, ‘shared all’ clustering doesn’t scale as effectively as shared-nothing clustering for small machines. All the nodes have access to the same data, so a controlling facility must be used to direct processing to make sure that all nodes have a consistent view of the data as it changes

Attempts by more than one nodes to update the same data need to be prohibited. This can cause performance and scalability problems

(similar to the concurrency aspect)

Page 18: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 18

Improving Database PerformanceImproving Database Performance

Shared-all architectures are well suited to the large scale processing found in main frame environments

Main frames are large processors capable of high work loads. The number of clustered PC’s and midrange processors, even with the newer, faster processors, which would equal the computing power from a few clustered mainframes, would be high - about 250 nodes.

Page 19: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 19

Improving Database PerformanceImproving Database Performance

This chart might help :

Shared Disk Shared Nothing Quick adaptability to High possibility of

changing workloads simpler, cheaper hardware

High availability Almost unlimited scalability

Data need not be Data may need to bepartitioned partitioned across the cluster

Page 20: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 20

Improving Database PerformanceImproving Database Performance

There is another technique - InfiniBand architecture which can reduce bottlenecks in the Input/Output level, and which has a further appeal of reducing the cabling, connector and administrative overheads of the database infrastructure

It is an ‘intelligent’ agent - meaning software.

Its main attraction is that it can change the way information is exchanged in applications. It removes unnecessary overheads from a system

Page 21: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 21

Improving Database PerformanceImproving Database Performance

Peripheral Interconnect (PCI) remains a bus-based system - this allows the transfer of data between 2 (yes, 2!) of the members at a time

Many PCI buses cause bottlenecks in the bridge to the memory subsystems. Newer versions of PCI allowed only a minor improvement - only 2 64 bit 66MKz adapters on the bus

A bus allows only a small number of devices to be interconnected, is limited in its electrical paths, and cannot adapt to meet high availability demands

Page 22: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 22

Improving Database PerformanceImproving Database Performance

A newer device, called a fabric, can scale to thousands of devices with parallel communications between each node

A group formed in 1999 (Intel/Microsoft, IBM, Compaq and Sun Future IO) to form the InfiniBand Trade Association

Their objective was to develop and ensure one standard for communication interconnects and system I/O.

One of their early findings was that replacing a bus architecture by fabric was not the full story - or solution

Page 23: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 23

Improving Database PerformanceImproving Database Performance

Their early solution needed to be synchronised with software changes. If not, a very high speed network could be developed, but actual application demands would not be met

InfinBand is comprised of

1 Host Channel Adapters (HCA)

2. Target Channel Adapters (TCA)

3. Switches

4. Routers

1 and 2 define end nodes in the fabric, 3 and 4 are interconnecting devices

Page 24: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 24

Improving Database PerformanceImproving Database Performance

The HCA (host channel adapter) manages a connection a connection and interfaces with a fabric

The TCA (target channel adapter) delivers required data such as a disk interface which replaces the existing SCSI interface

An InfiniBand switch links HCAs and TCAs into a network

The router allows the interface to other networks AND the translation of legacy components and networks. It can be used for MAN and WAN interfaces.

Page 25: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 25

Improving Database PerformanceImproving Database Performance

InfiniBand link speeds are identified in multiples of the base 1x - (0.5 Gb full duplex link - 0.25Gb in each direction)

Other defined sizes are $x (2 Gb full duplex) and 12x (6Gb full duplex).

Just for size :

A fast SCSi adapter could accommodate a throughput rate of 160Mb per second

A single InfinBand adapter 4x can deliver between 300 and 500 Mb per second

Page 26: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 26

Improving Database PerformanceImproving Database Performance

End Nodes Routers Switch

Page 27: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 27

Improving Database PerformanceImproving Database Performance

So far we have looked at improving database performance by

1. The use of ‘shared-all’ or ‘shared-nothing’ architectures

2. Implementing an InfiniBand communications interface and network facility

Page 28: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 28

Improving Database PerformanceImproving Database Performance

Now we are going to look at another option

It’s known as the ‘Federated Database’ environment

So, what is a ‘Federated Database’ ?

Try this: It is a collection of data stored on multiple autonomous computing systems connected

to a network.

The user or users is presented with what appears to be one integrated database

Page 29: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 29

Improving Database PerformanceImproving Database Performance

A federated database presents ‘views’ to users which look exactly the same as views of data from a centralised database

This is very similar to the use of the Internet where many sites have multiple sources - but the user doesn’t see them

In a federated database approach, each data resource is defined (as you have done) by means of table schemas, and the user is able to access and manipulate data

Page 30: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 30

Improving Database PerformanceImproving Database Performance

The ‘queries’ actually access data from a number of databases at a number of locations

One of the interesting aspects of a federated database is that the individual databases may consist of any DBMS (IBM, Oracle, SQL Server, possibly MS Access) and run on any operating system (Unix, VMS, MS-XP) and on different hardware ( Hewlett-Packhard servers, Unisys, IBM, Sun Microsystems …..

Page 31: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 31

Improving Database PerformanceImproving Database Performance

However, there a some reservations :

Acceptable performance requires the inclusion of a smart optimiser using the cost-based technique which has intelligence about both the distribution (perhaps a global data dictionary) and also the different hardware and DBMS at each accessed site.

Page 32: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 32

Improving Database PerformanceImproving Database Performance

Another attractive aspect of the federated arrangement is that additional database servers can be added to the federation at any time - and servers can also be deleted.

As a general comment, any multisource database can be implemented in either a centralised or federated architecture

In the next few overheads, there are some comments on this

Page 33: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 33

Improving Database PerformanceImproving Database Performance

The centralised approach has some disadvantages, the major one being that investment is large, and the return on investment may take many months, or years

The process includes these steps:

1. Concept development and data model for collecting data needed to support business decisions and processes

2. Identification of useful data sources (accurate, timely, comprehensive, available …)

Page 34: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 34

Improving Database PerformanceImproving Database Performance

3. Obtain a database server platform to support the database (and probably lead to data warehousing).

4. Capture data, or extract data, from the source(s)

5. Clean, format, and transform data to the quality and formats required

6. Design an integrated database to hold this data

7. Load the database (and review quality)

Page 35: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 35

Improving Database PerformanceImproving Database Performance

8. Develop systems to ensure that content is current (probably transaction systems)

From this point, that database becomes ‘usable’

So, what is different with the Federated Database approach

1. Firstly, the economics are different - the investment in the large, high speed processor is not necessary

2. Data is not centralised - it remains with and on the systems used to maintain it

Page 36: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 36

Improving Database PerformanceImproving Database Performance

3. The database server can be a mid-range, or several servers.

4. Another aspect is that it is probably most unlikely to run a query which regularly needs access to all of the individual databases - but with the centralised approach all of the data needs to be ‘central’.

5. Local database support local queries - that’s probably why the local databases were introduced.

Page 37: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 37

Improving Database PerformanceImproving Database Performance

The Internet offers the capability of large federations of content servers

Distributed application architectures built around Web servers and many co-operating databases are (slowly) becoming common both– within and– between

enterprises (companies).

Users are normally unaware of the interfacing and supporting software necessary for federated databases to be accessible

Page 38: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 38

Improving Database PerformanceImproving Database Performance

Finally, there is another aspect which is used to improve the availability and performance of a database

This occurs at the ‘configuration stage’ which is when the database and its requirements are being ‘created’

- quite different from the ‘create table’ which you have used

It is the responsibility of the System Administration and Database Management (and of course Senior / Executive Management)

Page 39: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 39

Improving Database PerformanceImproving Database Performance

Physical Layouts

The physical layout very much influences– How much data a database can hold– The number of concurrent and database users– How many concurrent processes can execute– Recovery capability– Performance (response time)– Nature of Database Administration– Cost– Expansion

Page 40: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 40

Improving Database Performance Oracle Architecture

Improving Database Performance Oracle Architecture

Oracle8i and 9i are object-relational database management systems. They contain the capabilities of relational and object-oriented database systems

They utilise database servers for many types of business applications including

– On Line Transaction Processing (OLTP)

– Decision Support Systems

– Data Warehousing

Page 41: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 41

Improving Database Performance Oracle Architecture

Improving Database Performance Oracle Architecture

In perspective, Oracle is NOT a ‘high end’ application DBMS

A high end system has one or more of these characteristics:– Management of a very large database (VLDB) - probably

hundreds of gigabytes or terabytes

– Provides access to many concurrent users - in the thousands, or tens of thousands

– Gives a guarantee of constant database availability for mission critical applications - 24 hours a day, 7 days a week.

Page 42: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 42

Improving Database Performance Oracle Architecture

Improving Database Performance Oracle Architecture

High end applications environments are not normally controlled by Relational Database Management Systems

High end database environments are controlled by mainframe computers and non-relational DBMSs.

Current RDBMSs cannot manage very large amounts of data, or perform well under demanding transaction loads.

Page 43: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 43

Improving Database Performance Oracle Architecture

Improving Database Performance Oracle Architecture

There are some guidelines for designing a database with files distributed so that optimum performance, from a specific configuration, can be achieved

The primary aspect which needs to be clearly understood is the nature of the database– Is it transaction oriented ?– Is it read-intensive ?

Page 44: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 44

Improving Database PerformanceImproving Database Performance

The key items which need to be understood are

– Identifying Input/Output contention among datafiles– Identifying Input/Output bottlenecks among all database

files– Identifying concurrent Input/Output among background

processes– Defining the security and performance goals for the

database– Defining the system hardware and mirroring architecture– Identifying disks which can be dedicated to the database

Page 45: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 45

Improving Database PerformanceImproving Database Performance

Let’s look at tablespaces :

These ones will be present in some combination

System Data dictionary

Data Standard-operation tables

Data_2 Static tables used during standard operation

Indexes Indexes for the standard-operation tables

Indexes_2 Indexes for the static tables

RBS Standard-operation RollBack Segments

RBS_2 Special RollBack segments used for data loads

Temp Standard operation temporary segments

Temp_user Temporary segments created by a temporary user

Page 46: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 46

Improving Database PerformanceImproving Database Performance

Tools RDBMS tools tables

Tools_1 Indexes for the RDBMS tools tables

Users User objects in development tables

Agg_data Aggregation data and materialised views

Partitions Partitions of a table or index segments; create multiple tablespaces for them

Temp_Work Temporary tables used during data load processing

Page 47: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 47

Improving Database PerformanceImproving Database Performance

(A materialised view stores replicated data based on an underlying query. A materialised view stores data which is replicated from within the current database).

A Snapshot stores data from a remote database.

The system optimiser may choose to use a materialised view instead of a query against a larger table if the materialised view will return the same data and thus improve response time. A materialised view does however incur an overhead of additional space usage, and maintenance)

Page 48: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 48

Improving Database PerformanceImproving Database Performance

Each of the tablespaces will require a separate datafile

Monitoring of I/O performance among datafiles is done after the database has been created, and the DBA must estimate the I/O load for each datafile (based on what information ?)

The physical layout planning is commenced by estimating the relative I/O among the datafiles, with the most active tablespace given a weight of 100.

Estimate the I/O from the other datafiles relative to the most active datafile

Page 49: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 49

Improving Database PerformanceImproving Database Performance

Assign a weight of 35 for the System tablespace files and the index tablespaces a value of 1/3 or their data tablespaces

Rdb’s may go as high as 70 (depending on the database activity) - between 30 and50 is ‘normal’

In production, Temp will be used by large sorts

Tools will be used rarely in production - as will the Tools_2 tablespace

Page 50: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 50

Improving Database PerformanceImproving Database Performance

So, what do we have ? - Something like this -

Tablespace Weight % of Total

Data 100 45

Rbs 40 18

System 35 16

Indexes 33 15

Temp 5 2

Data_2 4 2

Indexes_2 2 1

Tools 1 1

(220)

Page 51: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 51

Improving Database PerformanceImproving Database Performance

94% of the Input/Output is associated with the top four tablespaces

This indicates then that in order to properly the datafile activity, 5 disks would be needed, AND that NO other database files should be put on the disks which are accommodating the top 4 tablespaces

There are some rules which apply :

1. Data tablespaces should be stored separately from their Index tablespaces

2. RBS tablespaces should be stored separately from their Index tablespaces

Page 52: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 52

Improving Database PerformanceImproving Database Performance

and 3. The System tablespace should be stored separately from the other tablespaces in the database

In my example, there is only 1 Data tablespace. In production databases there will probably be many Data tablespaces (which will happen if Partitions are used).

If/when this occurs, the weightings of each of the Data tablespaces will need to be made (but for my efforts, 1 Data tablespace will be used).

Page 53: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 53

Improving Database PerformanceImproving Database Performance

As you have probably guessed, there are other tablespaces which require to be considered - many used by the many and various ‘processes’ of Oracle

One of these considerations is the on-line redo log files (you remember these and their purpose ?)

They store the records of each transaction. Each database must have at least 2 online redo log files available to it - the database will write to one log in sequential mode until the redo log file is filled, then it will start writing to the second redo log file.

Page 54: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 54

Improving Database PerformanceImproving Database Performance

Redo log files (cont’d)

The Online Redo Log files maintain data about current transactions and they cannot be recovered from a backup unless the database is/was shut down prior to backup - this is a requirement of the ‘Offline Backup’ procedure (if we have time we will look at this)

On line redo log files need to be ‘mirrored’

A method of doing this is to employ redo log groups - which dynamically maintain multiple sets of the online redo logs

The operating system is also a good ally for mirroring files

Page 55: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 55

Improving Database PerformanceImproving Database Performance

Redo log files should be placed away from datafiles because of the performance implications, and this means knowing how the 2 types of files are used

Every transaction (unless it is tagged with the nologging parameter) is recorded in the redo log files

The entries are written by the LogWriter (LGWR) process

The data in the transaction is concurrently written to a number of tablespaces(the RBS rollback segments and the Data tablespace come to mind) via the DataBase Writer (DBWR) and this raises possible contention issues if a datafile is located on the same disk as a redo log file

Page 56: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 56

Improving Database PerformanceImproving Database Performance

Redo log files are written sequentially

Datafiles are written in ‘random’ order - it is a good move to have these 2 different demands separated

If a datafile must be stored on the same disk as a redo log files, then it should not belong to the System tablespace, the RBS tablespace, or a very active Data or Index tablespace

So what about Control Files ?

There is much less traffic here, and they can be internally mirrored. (config.ora or init.orafile). The database will maintain the control files as identical copies of each other.

There should be 3 copies, across 3 disks

Page 57: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 57

Improving Database PerformanceImproving Database Performance

The LGWR background process writes to the online redo files in a cyclical manner

When the lst redo file is full, it directs writing to the 2nd file ….

When the ‘last’ file is full, LWGR starts overwriting the contents of the 1st file .. and so on

When ARCHIVELOG mode is used, the contents of the ‘about to be overwritten file’ are written to a redo file on a disk device

Page 58: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 58

Improving Database PerformanceImproving Database Performance

There will be contention on the online redo log as LGWR will be attempting to write to one redo log file while the Archiver (ARCH) will be trying to read another.

The solution is to distribute the redo log files across multiple disks

The archived redo log files are high I/O and therefore should NOT be on the same device as System, Rbs, Data, or Indexes tablespaces

Neither should they be stored on the same device as any of the online redo log files.

Page 59: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 59

Improving Database PerformanceImproving Database Performance

The database will stall if there is not enough disk space, and the archived files should directed to a disk which contains small and preferably static files

Concurrent I/O

A commendable goal, and one which needs careful planning to achieve.

Placing two random access files which are never accessed at the same time will quite happily avoid contention for I/O capability

Page 60: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 60

Improving Database PerformanceImproving Database Performance

What we have just covered is known as

1. Concurrent I/O - when concurrent processes are being performed against the same device (disk)

This is overcome by isolating data tables from their Indexes for instance

2. Interference - when sequential writing is interfered by reads or writes to other files on the same disk

Page 61: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 61

Improving Database PerformanceImproving Database Performance

At the risk of labouring this a bit,

The 3 background processes to watch are

1. DBWR, which writes in a random manner

2. LGWR, which writes sequentially

3. ARCH, which reads and writes sequentially

LGWR and ARCH write to 1 file at a time, but DBWR may be attempting to write to multiple files at once - (can you think of an example ?)

Multiple DBWR processes for each instance or multiple I/O slaves for each DBWR is a solution

Page 62: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 62

Improving Database PerformanceImproving Database Performance

What are the disk layout goals ?

Are they (1) recoverability or (2) performance

Recoverability must address all processes which impact disks (storage area for archived redo logs and for Export dump files - (which so far we haven’t mentioned) come to mind).

Performance calls for file I/O performance and relative speeds of the disk drives

Page 63: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 63

Improving Database PerformanceImproving Database Performance

What are some recoverability issues ?

All critical database files should be placed on mirrored drives, and the database run in ARCHIVELOG mode

The online red files must also be mirrored (Operating system or mirrored redo log groups)

Recoverability issues involve a few disks

and this is where we start to look at hardware specification

Page 64: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 64

Improving Database PerformanceImproving Database Performance

Mirroring architecture leads to specifying– the number of disks required– the models of disks (capacity and speed)– the strategy– If the hardware system if heterogeneous, the faster

drives should be dedicated to Oracle database files– RAID systems should be carefully analysed as to their

capability and the optimum benefit sought - RAID-1, RAID-

3 and RAID-5 have different processes relating to parity

Page 65: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 65

Improving Database PerformanceImproving Database Performance

The disks chosen for mirroring architecture must be dedicated to the database

This guarantees that non-database load on these disks will not interfere with database processes

Page 66: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 66

Improving Database PerformanceImproving Database Performance

Goals for disk layout :– The database must be recoverable– The online redo log files must be mirrored via the system

or the database– The database file I/O weights must be estimated– Contention between DBWR, LGWR and ARCH must be

minimised– Contention between disks for DBWR must be minimised– The performance goals must be defined– The disk hardware options must be known– The disk mirroring architecture must be known– Disks must be dedicated to the database

Page 67: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 67

Improving Database PerformanceImproving Database Performance

So where does that leave us ?

We’re going to look at ‘solutions’ from Optimal to Practical

and we’ll assume that :

the disks are dedicated to the database

the online redo log files are being mirrored by the Operating System

the disks are of identical size

the disks have identical performance characteristics

(obviously the best case scenario !)

Page 68: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 68

Improving Database PerformanceImproving Database Performance

So, with that optimistic outlook let’s proceed

Case 1 - The Optimum Physical LayoutDisk No Contents Disk No. Contents

1 Oracle Software 12 Control file 2

2 SYSTEM tablespace 13 Control file 3

3 RBS tablespace 14 Application software

4 DATA tablespace 15 RBS_2

5 INDEXES tablespace 16 DATA_2

6 TEMP tablespace 17 INDEXES_2

7 TOOLS tablespace 18 TEMP_USER

8 OnLine Redo Log 1 19 TOOLS_1

9 OnLine redo log 2 20 USERS

10 OnLine redo Log 3 21 Archived redo dest. disk

11 Control file 1 22 Archived dump file

Page 69: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 69

Hardware ConfigurationsHardware Configurations

• The 22 disk solution is an optimal solution.

• It may not be feasible for a number of reasons, including hardware costs

• In the following overheads there will be efforts to reduce the number of disks, commensurate with preserving performance

Page 70: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 70

Hardware ConfigurationsHardware Configurations

This leads to - 17 disk configuration

Disk Contents Disk Contents

1 Oracle software 11 Application software

2 SYSTEM tablespace 12 RBS_2

3 RBS tablespace 13 DATA_2

4 DATA tablespace 14 INDEXES_2

5 INDEXES tablespace 15 TEMP_USER

6 TEMP tablespace 16 Archived redo log

7 TOOLS tablespace destination disk

8 Online Redo log 1, Control file 1 17 Export dump

9 Online Redo log 2, Control file 2 destination disk

10 Online Redo log 3, Control file 3

Page 71: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 71

Hardware ConfigurationsHardware Configurations

The Control Files are candidates for placement onto the three redo log disks. The altered arrangement reflects this.

The Control files will interfere with the online redo logfiles but only at log switch points and during recovery

Page 72: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 72

Hardware ConfigurationsHardware Configurations

The TOOLS_1 tablespace will be merged with the TOOLS tablespace

In a production environment, users will not have resource privileges, and the USERS tablespace can be ignored

However, what will be the case if users require development and test access ?

Create another database ? (test ?)

Page 73: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 73

Hardware ConfigurationsHardware Configurations

The RBS and RBS_2 tablespaces have special rollback segments used during data loading.

Data loads should not occur during production usage, and so if the 17 disk option is not practical, we can look at combining RBS and RBS_2 - there should be no contention

TEMP and TEMP_USER can be placed on the same disk

The TEMP tablespace weighting (5 in the previous table) can vary. It should be possible to store these 2 tablespaces on the same disk.

TEMP_USER is dedicated to a specific user - (such as Oracle Financials, and these have temporary segments requirements which are greater than the system’s users)

Page 74: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 74

Hardware ConfigurationsHardware Configurations

The revised solution is now

Disk Contents Disk Content

1 Oracle software 11 Application software

2 SYSTEM tablespace 12 DATA_2

3 RBS, RBS_2 tablespace 13 INDEXES_2

4 DATA tablespace 14 Archived Redo Log

5 INDEXES tablespaces destination disk

6 TEMP, TEMP_USER tablespace 15 Export dump file

7 TOOLS tablespace destination disk

8 Online Redo Log 1, Control file 1

9 Online Redo Log 2, Control file 2 15 disks

10 Online Redo Log 3, Control file 3

Page 75: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 75

Hardware ConfigurationsHardware Configurations

What if there aren’t 15 disks ? -->> Move to attempt 3

Here the online Redo Logs will be placed onto the same disk. Where there are ARCHIVELOG backups, this will cause concurrent I/O and interference contention between LGWR and ARCH on that disk

What we can deduce from this, is that the combination about to be proposed is NOT appropriate for a high transaction system or systems running in ARCHIVELOG mode

(why is this so - Prof. Julius Sumner Miller ?)

Page 76: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 76

Hardware ConfigurationsHardware Configurations

The ‘new’ solution -Disk Contents1 Oracle software2 SYSTEM tablespace, Control file 13 RBS, RBS_2 tablespaces, Control file 24 DATA tablespace, Control file 35 INDEXES tablespaces6 TEMP, TEMP_USER tablespaces 12 disks7 TOOLS, INDEXES_2 tablespaces8 OnLine Redo Logs 1, 2 and 39 Application software10 DATA_211 Archived redo log destination disk12 Export dump file destination disk

Page 77: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 77

Hardware ConfigurationsHardware Configurations

Notice that the Control Files have been moved to Disks 2, 3 and 4

The Control Files are not I/O demanding, and can safely coexist with SYSTEM, RBS and DATA

What we have done so far is to ‘move’ the high numbered disks to the ‘low’ numbered disks - these are the most critical in the database.

The next attempt to ‘rationalise’ the disk arrangement is to look carefully at the high numbered disks.

Page 78: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 78

Hardware ConfigurationsHardware Configurations

– DATA_2 can be combined with with the TEMP tablespaces (this disk has 4% of the I/O load).

– This should be safe as the static tables (which ones are those ?) are not as likely to have group operations performed on them as the ones in the DATA tablespace

– The Export dump files have been moved to the Online Redo disk (the Redo log files are about 100Mb and don’t increase in size -(is that correct ?) Exporting causes minor transaction activity.

– The other is the combination of the application software with the archived redo log file destination area. This leaves ARCH space to write log files, and avoids conflicts with DBWR

Page 79: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 79

Hardware ConfigurationsHardware Configurations

Disk Content

1 Oracle software

2 SYSTEM tablespace, Control file 1

3 RBS tablespace, RBS_2 tablespace, Control file 2

4 DATA tablespace, Control file 3

5 INDEXES tablespace 9 disks

6 TEMP, TEMP_USER, DATA_2 tablespaces

7 TOOLS, INDEXES_2 tablespaces

8 Online Redo logs 1, 2 and 3, Export dump file

9 Application software, Archived Redo log destination disk

Page 80: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 80

Hardware ConfigurationsHardware Configurations

Can the number of required disks be further reduced ?

Remember that the performance characteristics will deteriorate

It’s now important to look closely at the weights set during the I/O estimation process.

Page 81: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 81

Hardware ConfigurationsHardware Configurations

Estimated Weightings for the previous (9 disk) solution are

Disk Weight Contents

1 Oracle software

2 35 SYSTEM tablespace, Control file 1

3 40 RBS, RBS_2 tablespace, Control file 2

4 100 DATA tablespace, Control file 3

5 33 INDEXES tablespaces

6 9 TEMP, TEMP_USER, DATA_2 tablespace

7 3 TOOLS, INDESES_2 tablespaces

8 40+ Online Redo logs 1,2 and 3, Export dumpfile destination disk

9 40+ Application software, archived redo log destination disk

Page 82: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 82

Hardware ConfigurationsHardware Configurations

A further compromise distribution could be

Disk Weight Contents

1 Oracle software

2 38 SYSTEM, TOOLS, INDEXES_2 tablespaces, Control file1

3 40 RBS, RBS_2 tablespaces, Control file 2

4 100 DATA tablespace, Control file 3

5 42 INDEXES, TEMP, TEMP_USER, DATA_2 tablespaces

6 40+ Online redo logs 1,2 and 3, Export dump file destination disk

7 40+ Application software, Archived redo log destination disk

Page 83: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 83

Hardware ConfigurationsHardware Configurations

A few thoughts for a small database system - 3 disks

1. Suitable for an OLTP application. Assumes that the transactions a small in size, large in number and variety, and randomly scattered among the available tables.

The application should be as index intensive as possible, and the full table scans must be kept to the minimum possible

Page 84: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 84

Hardware ConfigurationsHardware Configurations

2. Isolate the SYSTEM tablespace. This stores the data dictionary - which is accessed for every query and is accessed many times for every query

In a ‘typical case’, query execution requires– the column names to be checked in CODES_TABLE

table– the user’s privilege of access to the CODES_TABLE

table– the user’s privilege to access the Code column of the

CODES_TABLE table– the user’s role definition(s)– the indexes defined on the CODES_TABLE table– the columns of the columns defined on the

CODES_TABLE table

Page 85: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 85

Hardware ConfigurationsHardware Configurations

3. Isolate the INDEXES tablespace. This probably accounts for 35 to 40% of the I/O

4. Separate the rollback segments and DATA tablespaces

There is a point to watch here - with 3 disks there are 4 tablespaces - SYSTEM, INDEXES, DATA and RBS.

The placement of RBS is determined by the volume of transactions. If high, RBS and DATA should be kept apart.

If low, RBS and DATA should work together without causing contention

Page 86: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 86

Hardware ConfigurationsHardware Configurations

The 3 disk layout would be one of these :

Disk 1: SYSTEM tablespace, control file, redo log

Disk 2 : INDEXES tablespace, control file, redo log, RBS tablespace

Disk 3 : DATA tablespace, control file, redo log

OR

Disk 1 : SYSTEM tablespace, control file, redo log

Disk 2 : INDEXES tablespace, control file, redo log

Disk 3 : DATA tablespace, control file, redo log, RBS tablespace

Page 87: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 87

Hardware ConfigurationsHardware Configurations

Summary :

Database Type Tablespaces

Small development SYSTEM

database DATA

INDEXES

RBS

TEMP

USERS

TOOLS

Page 88: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 88

Hardware ConfigurationsHardware Configurations

Summary :

Database Type Tablespaces

Production OLTP SYSTEM

database DATA

DATA_2

INDEXES

INDEXES_2

RBS

RBS_2

TEMP

TEMP_USER

TOOLS

Page 89: Semester 1 2005 Week 11 Database Performance / 1 Week 11 Improving Database Performance

Semester 1 2005 Week 11 Database Performance / 89

Hardware ConfigurationsHardware Configurations

Summary :

Database Type Tablespaces Tablespaces

Production OLTP SYSTEM TEMP

with historical DATA TEMP_USER

data DATA_2 TOOLS

DATA_ARCHIVE

INDEXES

INDEXES_2

INDEXES_ARCHIVE

RBS

RBS_2