38
Progress OE performance management Yassine MOALLA Progress OE application support February 2017

Progress Openedge performance management

Embed Size (px)

Citation preview

Page 1: Progress Openedge performance management

Progress OE performance management

Yassine MOALLA

Progress OE application support

February 2017

Page 2: Progress Openedge performance management

Introduction

• Imagine you are an owner of a restaurant.

• You are employing two chefs, one broker and five servers.

• This is the launch time and many clients are coming in abundance.

• The broker is assigning servers to clients

• Servers are welcoming the clients, installing them and taking their orders.

• clients are waiting…. Then waiting…. Then waiting… Then they become angry and they begin to leave your restaurant.

• What’s happening? Why are meals not prepared quickly? What are chefs doing?

• You go to the kitchen and check.

Page 3: Progress Openedge performance management

Definition

• Application Performance Management (APM) is measuring application performance and tuning resources used by the application in order to maintain an expected level of service.

Page 4: Progress Openedge performance management

Performance management

QAD application performance depends on many factors:• CPU: Manipulates data and executes programs• Memory: Stores data so it can be accessed quickly while performing

operations• Disks controllers: Read and write data to disks• Operating system mechanism: Allocate and use system resources• Network: Exchange data between clients and servers• ….. and many other resources Let we agree on some indicators which indicates the performance status.

Page 5: Progress Openedge performance management

CPU

• Let we imagine that the CPU is the Chef

• The chef manipulates ingredients (data) and prepares a plate according to a customer order (a client’s query) .

Page 6: Progress Openedge performance management

• If clients are complaining and your chef is busy at 100% Your chef is overloaded: You need to recruit an other chef to help him.

CPU

Page 7: Progress Openedge performance management

• If clients are complaining; and your Chef is taking a rest (not busy) Your chef might be waiting for something: Identify the bottleneck*and eliminate it.

• *Performance bottlenecks occur when a resource performs inadequately and prevents other resources from accomplishing their work.

CPU

Page 8: Progress Openedge performance management

CPU threshold

• CPUs are overloaded when run queue size is greater than Number of CPUs on the machine

• CPUs are not busy when CPU are waiting or in idle status:Case of High IO Read Load • procs -----------memory---------- ---swap-- -----io--- --system-- -----cpu------

• r b swpd free buff cache si so bi bo in cs us sy Id wa st

• 3 1 465872 36132 82588 1018364 7 17 70 127 214 838 12 3 82 3 0

• 0 1 465872 33796 82620 1021820 0 0 34592 0 357 781 6 10 0 84 0

• 0 1 465872 36100 82656 1019660 0 0 34340 0 358 723 5 9 0 86 0

• 0 1 465872 35744 82688 1020416 0 0 33312 0 345 892 8 11 0 81 0

• 0 1 465872 35716 82572 1020948 0 0 34592 0 358 738 7 8 0 85 0

Page 9: Progress Openedge performance management

Disks controllers

• Disk I/O operation is a common bottleneck because reading and writing data to disk is a relatively slow operation.• To come back to our example, imagine that every time the chef (CPU) needs

an ingredient (data) he has to ask his assistant (database server) to go to the storage (database) and bring the ingredient (data). You agree that with this way the chef is spending a lot of time in waiting for ingredients (data). Solution is: Minimize time to retrieve the ingredients (data). But how?

Page 10: Progress Openedge performance management

• Storing most useful ingredients (data) on a table (buffer pool) and put it just near the chef (CPU) is a solution to minimize Disk I/O operations time.

• Every time the chef uses ingredients (data) from buffer pool, we call that: Buffer hit.

• If buffer hits is around 99% The chef is finding the ingredients he needs at 99% of the time -B db startup parameter is well tuned

Memory/Buffers

Page 11: Progress Openedge performance management

Buffer hits

• Let we call:• A data read from buffer pool Logical read• A data written to buffer pool Logical write• A data read from disk OS read• A data written to disk OS write• Buffer hits = (( logical reads + logical writes ) - (OS reads + OS writes)) /

(logical reads + logical writes).• Buffer hits must be 99% or more. Why?

• 95% = 5 OS read/write of 100 went to disk• 99% = 1 OS read/write of 100 went to disk• Conclusion: 99% is 500% better than 95% and not 5%.

Page 12: Progress Openedge performance management

• If buffer hits is below 99% most of the time The table space (buffer pool) is not big enough to include all useful ingredients (data). Need to increase the table (buffer pool) space Increase –B db startup parameter.

Tuning Buffers

Page 13: Progress Openedge performance management

• The PRO of using buffers is to improve performance by decreasing disk I/O operations.

• The CON of using too many buffers is to let the system swapping so make disk I/O operations.

Be careful!!

Page 14: Progress Openedge performance management

• If the chef doesn’t find an ingredient (data) he needs on the table (buffer pool), he will ask his assistant (database server) to bring it. The assistant will take a least recently used (LRU) ingredient from the table to the storage (database) and replace it by the required ingredient (data). This operation is known as eviction.

• Eviction requires two steps: writing data to disk then writing new data from disk into buffer.

Least Recently Used list (LRU)

Page 15: Progress Openedge performance management

Asynchronous Page Writer (APW)

• Remember: Eviction requires two steps: writing data to disk then writing new data from disk into buffer.

• What if you recruit a new agent (let we call it APW) who; in background; will write dirty buffers* to disk then mark them as available. By this way, the database server has only to write data from database to buffer. So, only one operation!!

• *Dirty buffer: is a buffer which was changed on memory but is not yet written to disk.

Page 16: Progress Openedge performance management

Checkpoints

• A Checkpoint is a process by which the in-memory and on-disk state of the database are reconciled.

• As transactions are executed, changes are made on buffers into memory. The version that is on disk becomes progressively more obsolete. During a checkpoint, all database changes are written to disk, making the volatile (data into memory) and stabile copies (data into disk) consistent.

• During checkpoints, NO transaction activity can take place.

• Asynchronous Page Writers (APWs) minimizes this overhead by periodically writing modified buffers to disk, so that when a checkpoint occurs, fewer buffers need to be written to disk.

Page 17: Progress Openedge performance management

Checkpoint frequency/duration

• To view the checkpoint duration value:• Access PROMON and enter R&D at the main menu.

• Choose option 3, Other Displays, then choose option 4, Checkpoints

• A duration less than 1 second is a good goal.

Page 18: Progress Openedge performance management

Shared resources/Latches

• Let we come back to the restaurant example.

• In the restaurant there is only one toilet for all clients.

• The client who gets in first will turn the latch on to lock the door.

• When he gets out, he will turn the latch off. So the door is open and the toilet is available.

• LRU list is a shared resource with exclusive lock. Only one APW can update it at a time.

• What happen if there are many APWs which need to update the LRU?

Page 19: Progress Openedge performance management

Spin/Nap

• When a APW needs to update the LRU (Least Recently Used) list, it attempts to acquire the LRU’s latch. If it cannot acquire the LRU’s latch, it repeats the attempt. This iterative process is called spinning.

• If a APW fails to acquire the LRU’s latch after a specified number of spins, the APW pauses or takes a nap before trying again.

• If a APW repeatedly fails to acquire a latch, the length of its nap is gradually increased.

• You can set the Spin Lock Retries (-spin) parameter to specify how many times to test a lock before napping.

Page 20: Progress Openedge performance management

Tuning -spin

• To view the resource waits value:• Access PROMON and enter R&D at the main menu.

• Choose option 3, Other Displays, then choose option 1, Performance Indicators, to view the resource waits. Resource waits is the last item reported in the listing. Less then 10/s is OK.

• High “Latch timeouts” could indicate low –spin

Page 21: Progress Openedge performance management

Be careful!!

• By setting the -spin value higher, you can reduce the resource waits.

• Continuing to set it higher can adversely effect CPU utilization.

Page 22: Progress Openedge performance management

Latch counts

• LRU, MRU, PWQ and CPQ are not the only latches used by Progress.

• To see all the latches:• Access PROMON and enter R&D at the main menu then enter DEBUG menu.

• Choose option 6 (hidden menu), then choose option 11, Latch counts

Page 23: Progress Openedge performance management

OM Latch

The Storage Object Cache is used to hold copies of information about the most frequently

accessed objects so that they can be found quickly, with minimal overhead.

Page 24: Progress Openedge performance management

Tuning OM Latch

• OM (Object Cache) Latch activity can be totally eliminated by setting the -omsize parameter equal to or greater than the number of _StorageObject records.

Page 25: Progress Openedge performance management

Tuning APW

• The PROMON R&D option Page Writers Activity display shows statistics about APWs running on your system.

• Nonzero numbers in the Flushed at Checkpoint row indicates that the APW was unable to write buffers fast enough to prevent a memory flush.

• Increase the number of APWs and/or increase the cluster size to eliminate the flush.

Page 26: Progress Openedge performance management

Be careful!!

• The PRO of using APWs is to improve performance by:• Ensuring that a supply of empty buffers is available. So the database engine

doesn’t have to wait for database buffers to be written to disk

• Reducing dirty buffers in memory. When a checkpoint occurs, fewer buffer must be written.

• The CON of using Too Many APWs is to create many conflicts on the LRU’s latch.

Page 27: Progress Openedge performance management

Database block size

A second way to decrease disk I/O operations is to use larger block size.

• Disk I/O transfer is made by block. Larger block size provides greater I/O efficiency.

• On Linux, 8kb may be used with kernel versions 2.6 and newer.

SlowerFaster

Page 28: Progress Openedge performance management

Before Image

• Before image is always enabled to let PROGRESS recover transactions if the system fails. This mechanism is very important for the DB reliability, but it creates a significant amount of I/O that can affect performance.

• PROGRESS must always record a change in the BI file before it can record a change in the database and after-image (AI) files. If BI activity creates an I/O bottleneck, all other database activities are affected.

Page 29: Progress Openedge performance management

Monitoring Before Image activity

• Use the PROMON utility to monitor specific BI activity. Use the R&D option BI Log Activity. The following figure shows a sample display.

Page 30: Progress Openedge performance management

Before Image Writer

• The BIW is a background process that continually writes filled BI buffers to disk.

• BIW ensures that a supply of empty buffers is available to client and server processes.

• When a BIW is not started, the database manager assumes the responsibilities of flushing bi buffers from shared memory to disk

Page 31: Progress Openedge performance management

Tuning BI buffers

• Increasing the number of BI buffers increases the availability of empty BI buffers to client and server processes.

• Increase BI buffers if there are any empty buffer waits in the PROMON Activity screen

• Increase BI buffers if “Writes by BIW” is not too high. “Writes by BIW” is a percentage of the total number of BI Blocks to disk.

Page 32: Progress Openedge performance management

BI block size

• PROGRESS reads and writes information to the BI file in blocks. Increasing the size of these blocks allows PROGRESS to read and write more data at one time. This can reduce I/O rates on disks where the BI files are located.

Page 33: Progress Openedge performance management

BI cluster size

• The BI file is organized into clusters on disk. As PROGRESS writes data to the BI file, these clusters fill up. When a cluster fills, PROGRESS must ensure that all modified database buffer blocks referenced by notes in that cluster are written to disk. This is known as a checkpoint.

• Raising the BI cluster size increases the interval between checkpoints.

• Raising the BI cluster size can reduce the I/O overhead of writing modified database buffers to disk.

• It also lets you defer writes and collect more changes before writing a block; this lets you write multiple changes with the same write.

Page 34: Progress Openedge performance management

BI cluster size

• Larger cluster sizes generally increase performance. They also have drawbacks:

• Increased disk space usage for the BI file.

• Longer crash recovery periods.

• Longer checkpoint times.

Page 35: Progress Openedge performance management

Index

Using bad indexes can lead to bad performance.

• If the CPU is Waiting while the database server is busy searching for required data Client waiting too much Bad performance

• The PRO of having appropriate index helps the disk controller to pick the right data with minimal disk I/O operations.

• The CON of having too many index is that every update of a record will require updates to all index related to the fields which have been updated. Therefore, one record update might be multiple I/O operations.

Page 36: Progress Openedge performance management

Locked records

• Long transactions can lead to too many records locks.

• Database server will in this case wait to access the locked record.

• Check “Rec Lock Waits” in promon monitor and try to keep this percentage as low as possible by modifying the application to perform shorter transactions.

Page 37: Progress Openedge performance management

Tuning Locked records

• To find who is performing the most locks:• Access PROMON and enter R&D at the main menu.

• Choose option 3, Other Displays, then choose option 3, Lock requests by yser, to view the resource waits. Resource waits is the last item reported in the listing. Identify the user number with the most locks.

Page 38: Progress Openedge performance management

Appserver agents

• Appserver agents number should be set up correctly to avoid bad performance.

• Minimum servers: Progress Corp recommends one Appserver per each CPU in the machine.

• You can expect each Appserver agent to handle requests from up to 20 Appserver clients (or more).