49
Unit - 4 Introduction to the Other Databases

Unit - 4 Introduction to the Other Databases. Introduction :- Today single CPU based architecture is not capable enough for the modern database

Embed Size (px)

DESCRIPTION

 Introduction :-  Today single CPU based architecture is not capable enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance, increase availability, distributed access to data, analysis of distributed data and so on.  To meet the complex requirement of users, the modern database system today operate with the architecture where multiple CPUs are working parallel to provide the complex database services.  In some of the architectures, multiple CPUs are working in parallel and are physically located in closed environment, in the same building and communicating at very high speed.  The databases operating in such a environment are called Parallel Databases.

Citation preview

Page 1: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Unit - 4

Introduction to the Other

Databases

Page 2: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

PARALLELDATABASE SYSTEM

Page 3: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Introduction :- Today single CPU based architecture is not capable

enough for the modern database that are required to handle more demanding and complex requirements of the users, for example, high performance, increase availability, distributed access to data, analysis of distributed data and so on.

To meet the complex requirement of users, the modern database system today operate with the architecture where multiple CPUs are working parallel to provide the complex database services.

In some of the architectures, multiple CPUs are working in parallel and are physically located in closed environment, in the same building and communicating at very high speed.

The databases operating in such a environment are called Parallel Databases.

Page 4: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

In parallel database system, multiple CPUs work in parallel to improve performance through parallel implementation of various operations such as loading data, building indexes and evaluating queries.

Parallel processing divides a large task into many smaller task and execute the smaller tasks concurrently on several CPUs.

As a result the larger task complete more quickly. Parallel database system improve the processing and

I/O speed by using multiple CPUs and disks working in parallel.

the parallel databases are essentially useful for applications that have to query large databases and process large number of transactions per second.

In parallel processing many operations are performed simultaneously, as opposed to the centralized processing, in which serial computation is performed.

Page 5: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

The goal of Parallel Database System :-

To ensure that the database system can continue to perform at one acceptable speed, even as the size of database and the number of transactions increases.

And this can be done by increasing the capacity of the system by increasing the parallelism provides a smoother path for growth for an enterprise then does replacing a centralized system by a faster machine.

Page 6: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

The parallel database systems are usually designed to provide a best cost-performance and they are quit uniform in site machine architecture.

The cooperation between site machines is usually achieved at the level of the transaction module of the database system.

Parallel database system represent an attempt to construct a faster centralized computer using several small CPUs.

Page 7: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

WHY DO WE NEED THEM?

• More and More Data!

We have databases that hold a high amount of data, in the order of 1012 bytes:

10,000,000,000,000 bytes!• Faster and Faster Access!

We have data applications that need to process data at very high speeds:

10,000s transactions per second!

SINGLE-PROCESSOR DBMS CANNOT DO THIS JOB.....!

Page 8: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Advantages of Parallel Database System :- Increase Throughput (Scale-Up). Increase Response time (Speed-Up) Useful to the application to query extremely large

databases and to process an extremely large number of transactions rate (in order of thousands of transactions per second).

Increase availability of the system. Grater flexibility. Possible to serve a large number of users.

Disadvantages of Parallel Database System :- More Start-Up Cost. Interface Problem.

Page 9: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Improves Response Time.

INTERQUERY PARALLELISM

It is possible to process a number of transactions in parallel with each other.

Improves Throughput.

INTRAQUERY PARALLELISM

It is possible to process ‘sub-tasks’ of a transaction in parallel with each other.

BENEFITS OF A PARALLEL DBMS

Page 10: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Speed-Up. As you multiply resources by a certain factor, the time taken to execute a transaction should be reduced by the same factor:

10 seconds to scan a DB of 10,000 records using 1 CPU 1 second to scan a DB of 10,000 records using 10 CPUs

HOW TO MEASURE THE BENEFITS

Scale-up. As you multiply resources the size of a task that can be executed in a given time should be increased by the same factor.

1 second to scan a DB of 1,000 records using 1 CPU 1 second to scan a DB of 10,000 records using 10 CPUs

Page 11: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Linear speed-up (ideal)

Number of CPUs

Num

ber o

f tra

nsac

tions

/sec

ond

1000/Sec

5 CPUs

2000/Sec

10 CPUs

SPEED-UP

Page 12: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

SCALE-UP

10 CPUs2 GB Database

Number of CPUs, Database size

Num

ber o

f tra

nsac

tions

/sec

ond

Linear scale-up (ideal)1000/Sec

5 CPUs1 GB Database

Page 13: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

ARCHITECTURE OF

PARALLELDATABASE

1.) Shared-Memory Multiple CPU :-2.) Shared-Disk Multiple CPU :-3.) Shared-Nothing Multiple CPU :-

Page 14: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Shared-Memory Multiple CPU :- In this system a computer has multiple

simultaneously active CPUs that are attached to an interconnected network and can share a single MAIN MEMORY.

Thus in this architecture a single copy of a multithreaded Operating System and multithreaded DBMS can support multiple CPUs.

This architecture of Parallel Database System is closest to the traditional single CPU processer of centralized database system, but much faster in performance as compare to the single CPU of the same power.

Page 15: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Shared Memory Multiple CPU Architecture

Page 16: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Benefits of Shared-Memory :- Communication between CPUs is

extremely efficient. Data can be access by any CPU without being moved with software. A CPU can send a message to the other CPU much faster by using memory writes, which usually takes less then a microsecond, then by sending a message through a communication mechanism.

The communication overhead are low, because of main memory can be used for this purpose and operating services can be used to utilize the additional CPUs.

Page 17: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Limitations of Shared-Memory :- Memory access uses a very high speed

mechanism that is difficult to partition without losing efficiency. Thus the design must take the special type of different CPUs have equal access to a common memory.

Since the communication bus or interconnection network is shared by all CPUs, this architecture is not capable beyond 80 or 100 CPUs in parallel. The bus and interconnection network become a bottleneck as the number CPUs increase.

The addition of more CPUs causes CPUs to spend time waiting for their turn on the bus to access memory.

Page 18: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Shared-Disk Multiple CPU :- In this system multiple CPUs are attached to an

interconnection network and each CPU ha its own memory but all of them have access to the same disk storage or more commonly to the shared array of disk.

The scalability of the system is largely determine by the capacity and the throughput of the interconnection network.

Since the main memory is not shared among the CPU, each machine has its own OS and its own DBMS.

It is possible that with the same data accessible, two or more nodes want to read or write the same data at the same time.

Therefore the global locking scheme is require to preservation of the data integrity.

Page 19: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Shared Disk Multiple CPU Architecture

Page 20: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Benefits of Shared Disk Architecture :- Easy to load balance, because data does

not have to be permanently divided among available CPUs.

Since each CPU has its own memory, the memory bus is not a bottleneck.

It offers a low cost solution to provide a degree of fault tolerance. In this case of a CPU or memory failure, the other CPUs take over its task; since the database is resident on disk that are accessible from all CPUs.

It has found acceptance in wide applications.

Page 21: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Limitations of Shared Disk :- It is also facing the problems of interface and

memory contention bottleneck as the number of CPUs increase. As more CPUs are added, the existing CPUs are slow down because of the increased contention for memory accesses and network bandwidth.

It is also having the problem of scalability. The interconnection to the disk subsystem become bottleneck, particularly when the database makes the large number of access to the disk.

Page 22: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Shared Nothing Multiple CPU :-In this system multiple CPUs are

attached with interconnecting network and each CPU has a local memory and a local disk storage, but no two CPU can access the same storage area.

All communication between CPUs is through a high-speed interconnection network.

Thus the shared nothing environments involve no sharing on memory or disk.

Page 23: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Each CPU has its own copy of OS and its own copy of DBMS and its own copy of a portion of a data managed by DBMS.

In this type of architecture CPUs sharing responsibilities for database services usually split up the data among themselves.

CPUs then perform the transactions and queries by dividing up the work and communicating by messages over the high speed network.

Page 24: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Shared Nothing Multiple CPU Architecture

Page 25: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Benefits of Shared Nothing Architecture :-This architecture minimized the

connection of CPUs by not sharing resources and therefore offer a high degree of scalability.

Since local disk references are serviced by local disk ay each CPU, this architecture overcomes the limitations of requiring all I/O to go through a single interconnection network. Only queries accesses to non-local disk and result relation pass through the network.

Page 26: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

The interconnection network for this architecture are usually designed to be scalable. Thus adding more CPUs and more disks enables the system grow in a manner that is divided the power and the capacity of the newly added component.

In other words the shared-nothing architecture provides linear Speed-Up and linear Scale-Up.

Linear Speed-Up and Scale-Up properties increase the transmission capacity of shared-nothing architecture as more nodes are added and therefore, it can easily support the large number of CPUs.

Page 27: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Limitations of Shared Nothing Architecture :-

Shared nothing architecture are difficult to load-balance. In many multi CPU environments, it necessary to split the system work load in some way so that all system resources are being used efficiently. Proper splitting or balancing workload across the shared nothing system requires an administrator to properly partition or divide the data across the various disks. In practice this is difficult to achieve.

Adding a new CPU and disk to Shared-Nothing Architecture means the data may needed to be redistributed in order to make advantage of the new resources and thus require more extensive reorganization of DBMS.

Page 28: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

The cost of communication and non-local disk access are higher then in Shared-Disk or Shared-Memory architecture because of sending data involves software interaction at both the ends.

The high speed network are limited in size, because of speed-of –light consideration. This leads to the requirement that a parallel architecture has CPUs that are physically closed together.

It requires an OS that is capable of accommodating the heavy amount of messaging that are require to support the inter processor communication.

Page 29: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

KEY ELEMENTS OF PARALLEL

DATABASE PROCESSING

1.) Speed-Up :-2.) Scale-Up :-3.) Synchronization :-4.) Locking :-

Page 30: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

1.) Speed-Up :- Speed-Up is the property in which the time taken

for performing the task decreases in case of increasing the number of CPUs.

In other word Speed-Up is the property of running a given task in less time by increasing the degree of parallelism (more number of hardware).

With additional hardware, Speed-Up holds the task constant and measure the time saved.

Thus, Speed-Up enables user to improve the system response time for their queries, assuming the size of their database remain the same.

Page 31: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

To = Execution time of a task on the original or smaller machine (or original processing time)

Tp = execution time of the same task on parallel or larger machine (or parallel processing time).

Here the original processing time To is the time spent by a centralized system or small system on the given task.

And the parallel processing time Tp is the time spent by large system or Parallel System on the same task.

Page 32: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Consider a database application running on a parallel system with a certain number of CPUs and disks.

Now suppose the size of system is increase by increasing the number of CPUs, disks and other hardware components.

The goal is to process the task in time inversely proportional to the number of CPUs and disk allocated.

For example, if original system takes 60 seconds to perform the task and the parallel system (with double capacity) takes 30 seconds to complete the same task then the value of Speed-Up = 60/30 = 2. the Speed-Up value 2 in indicate the Linear Speed-Up.

If the Speed-Up is N when the larger system has N times the resources of the smaller system.

If the Speed-Up value is less then N then the system is said to demonstrate Sub Linear Speed-Up.

Page 33: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Linear speed-up (ideal)

Number of CPUs

Num

ber o

f tra

nsac

tions

/sec

ond

1000/Sec

5 CPUs

2000/Sec

10 CPUs

SPEED-UP

Sub Linear speed-up

Page 34: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

2.) Scale-Up :- Scale-Up is the property in which the performance of

the parallel database is sustained if the number of CPU and disk are increased in proportional to the amount of data.

In other word, Scale-Up is the ability of handling the large task by increasing the degree of parallelism, in the same time period as the original system.

With added hardware the formula for Scale-Up holds the time constant and measure the increase size of task.

Thus the Scale-Up enable users to increase the size of their database while maintaining the same response time.

Page 35: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Vp = Parallel or Large Processing Volume. Vo =Original or Small Processing Volume.

Here the Original Processing Volume is the transaction volume process in the given amount of time on a smaller system. Parallel Processing Volume is the transaction volume process in the given amount of time on a larger system.

For Example, if the original system can process 3000 transactions in given amount of time and if the parallel system can process 6000 transactions in the same amount of time then the Scale-Up = 6000/3000 = 2.

Page 36: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

The Scale-Up value 2 is an indication of the Linear Scale-Up, which means that the twice as much of hardware can process twice the data volume in same amount of time.

If the Scale-Up value is less then 2 then it is called Sub Linear Scale-Up.

That means as much of times we increase the resources of the parallel system, the value of Linear Scale-Up will also be increase that much of times.

Page 37: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

SCALE-UP

10 CPUs2 GB Database

Number of CPUs, Database size

Num

ber o

f tra

nsac

tions

/sec

ond

Linear scale-up (ideal)1000/Sec

5 CPUs1 GB Database

Sub Linear scale-up

Page 38: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

3.) Synchronization :- Synchronization is the coordination of the

current task. For a successful operation of the parallel

database system , the task should be divided such that the synchronization requirement is less. It is necessary for the correctness.

With less synchronization requirement better speed-up and scale-up can be achieved.

The amount of synchronization depends on the amount of resources and the number of users and the task working on the resources.

More synchronization is require to coordinate large number of concurrent tasks.

Page 39: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

4.) Locking :- Locking is a method of synchronizing current task., Both internal as well as external locking mechanisms

are used for synchronization of tasks that are required by the parallel database system.

For external locking, a distributed lock manager (DLM) is used, which is apart of the OS.

DLM coordinate the resources sharing between communication nodes running a parallel server.

The instances of parallel server use the DLM to communicate with each other and coordinate modification of database resources. The DLM allows application to synchronize access to resources such as data, software and devices, so that current requests for the same resource are coordinate between applications running on different nodes.

Page 40: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

QUERY PARALLELISM

1.) Intra-Query Parallelism :-2.) Inter-Query Parallelism :-3.) Intra-Operation Parallelism :-4.) Inter-Operation Parallelism :-5.) Input / Output Parallelism :-

Page 41: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

1.) Intra-Query Parallelism :- Intra-Query Parallelism refers to the execution of

single query in parallel on multiple CPUs using Shared-Nothing Architecture Technique.

It is some times called Parallel Query Processing. For example, suppose a table has been partitioned

across multiple disks by range partitioning on some attribute and now user want to perform SORT on the partitioning attribute.

The SORT operation can be implemented by sorting each portion in parallel, then concatenating the sorted portions to get the final sorted relation.

Thus a query can be parallelized by parallelizing individual operations.

Page 42: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Advantages :- Intra-Query Parallelism Speeds Up long

running queries. They are beneficial for decision support

applications that issues complex, read-only queries, including queries involving multiple JOINs.

CPU 1 CPU 2 CPU 3 CPU ‘N’

Interconnection Network

Query 1

Page 43: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

2.) Inter-Query Parallelism :- In Inter-Query Parallelism multiple transactions are

executed in parallel, One by each CPU. It sometimes also called as Parallel Transaction

Processing. The primary use of Inter-Query Parallelism is to Scale-

Up a Transaction Processing system to support a large number of transaction per second.

To support a Inter-Query Parallelism DBMS generally uses a task or transaction dispatching.

Efficient lock management is another method to used by DBMS to support Inter-Query Parallelism, particularly in Shared-Disk Architecture.

Since in Inter-Query Parallelism each query is run sequentially, it does not help in speeding up in long running query.

Page 44: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

In such a case DBMS must understand the locks held by different transactions executing on different CPUs in order to preserve data integrity.

Inter-Query Parallelism on Shared-Disk architecture perform best when transactions that execute in parallel do not access the same disk.

Transaction 1 Transaction 1 Transaction 1 Transaction N

CPU 1 CPU 2 CPU 3 CPU N

Interconnection Network

Page 45: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Advantages :- Easiest form of parallelism to support in a

database system, particularly in Shared-Disk Parallel System.

It Scale-Up a transaction processing system to support a large number of transactions per second.

Disadvantages :- Response time of individual transaction are

no faster then they would be if the transaction were run in isolation.

It is more complicated in Shared-Memory and Shared-Nothing Architectures.

Page 46: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

3.) Intra-Operation Parallelism :- In Intra-Query Parallelism of each individual

operation of a task, such as sorting, projection, join and so on.

Since the number of operations in a typical query small, compared to the number of tuples processed by each operation, Intra-Operation Parallelism scales better with increasing parallelism.

Advantages :-

Inter-Operation Parallelism is natural in a Database.

Degree of Parallelism is potentially enormous.

Page 47: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

4.) Inter-Operation Parallelism :- In Inter-Operation Parallelism, the different

operations in a query expression are executed in parallel.

Following two types of Inter-Operation Parallelism are used : Pipelined Parallelism :- Independent Parallelism :-

1. Pipelined Parallelism :- In this parallelism output tuples of one operation A

are consumed by second operation B, even before the first operation has produced the entire set of tuples in its output.

Thus it is possible to run operation A and B simultaneously in different processors, so that the operation B consumes tuples in parallel with operation A producing them.

Page 48: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

Advantages :- Pipelined parallelism useful with smaller number of

CPUs. Also pipelined execution avoid writing intermediate

result to disk.

Disadvantages :- It does not Scale-Up well. Pipelined chain do not attain sufficient length to

provide a high degree of parallelism. It is not possible to pipeline relational operators

that do not produce output until all inputs have been accessed.

Only marginal Speed-Up is obtained for the frequent case in which one operation’s cost is much higher then the others.

Page 49: Unit - 4 Introduction to the Other Databases.  Introduction :-  Today single CPU based architecture is not capable enough for the modern database

2. Independent Parallelism :- In an independent parallelism the operations in

query expression that do not depend on one other can be ececute in parallel.

Advantages :-

It is useful with a lower degree of parallelism.

Disadvantages :-

Like pipelined parallelism, independent parallelism does not provide a high degree of parallelism so it is less useful in highly parallel system.