Design-Build for Performance E-Learning Course Transcript

8/10/2019 Design-Build for Performance E-Learning Course Transcript

1/17

Designing and Building for Performance

e Learning Course Transcript


2/17

Copyright 2008Pegasystems Inc., Cambridge, MA

All rights reserved.

This document

describes

products

and

services

of

Pegasystems

Inc.

It

may

contain

trade

secrets

and proprietary information. The document and product are protected by copyright and distributed under licenses restricting their use, copying distribution, or transmittal in any form without prior written authorization of Pegasystems Inc.

This document is current as of the date of publication only. Changes in the document may be made from time to time at the discretion of Pegasystems. This document remains the property of Pegasystems and must be returned to it upon request. This document does not imply any commitment to offer or deliver the products or services described.

This document may include references to Pegasystems product features that have not been licensed by your company. If you have questions about whether a particular capability is included in your installation, please consult your Pegasystems service consultant.

For Pegasystems trademarks and registered trademarks, all rights reserved. Other brand or product names are trademarks of their respective holders.

Although Pegasystems Inc. strives for accuracy in its publications, any publication may contain inaccuracies or typographical errors. This document or Help System could contain technical inaccuracies or typographical errors. Changes are periodically added to the information herein. Pegasystems Inc. may make improvements and/or changes in the information described herein at any time.

This document is the property of: Pegasystems Inc. 101 Main Street Cambridge, MA 02142 1590

Phone: (617) 374 9600 Fax: (617) 374 9620

www.pega.com

Updated: March 18, 2008


3/17

Contents

Root Causes of Performance Issues ..................................................................................................... 1

Performance Analysis Checklist ............................................................................................................ 2 Identifying Design Time Warnings with Preflight ................................................................................. 2

Design Time Warnings When Creating Classes ................................................................................... 2

List View Unit Testing Considerations .................................................................................................. 4

Troubleshooting List View Report Performance .................................................................................. 4

Design Time Warnings for Java Act ivity Method ................................................................................. 5

Design Time Warnings for WriteNow and Obj-Save ............................................................................ 6

Using the MyAlerts Tool to Unit Test Rule Performance ..................................................................... 6

Optimizing Query Performance with DB Time Alert ............................................................................ 7

Optimizing Performance with Browser Interact ion Alert ..................................................................... 8

Optimizing Performance with DB Bytes Read Alert ............................................................................. 8

Optimizing Performance with Connect Total Time Alert ..................................................................... 9

Uncovering Hidden Performance Issues with PAL .............................................................................. 9

Using Alert and Error Logs to Enhance Performance ....................................................................... 12

Blob Data Retrieval and Performance Implications ........................................................................... 12


4/17


5/17

Root Causes of Performance Issues Overall performance is a function of how each of the main components

performs within an application, which in PRPC includes the following: Browser

PegaRULES database server

Other Systems accessed via connectors and services

Application server in which PRPC resides

Network each of these components uses to communicate

Each component has a capacity, and when the load on that capacity approaches its limit, performance begins to suffer. For instance,

CPU can become a bottleneck when developers write inefficient processing logic

The Application Servers Java Virtual Machine (JVM) shows stress in terms of garbage collection overhead when too much memory (in the form of clipboard pages) is used

When we send and/or receive too much data, pressure is placed on the network, the database, the browser, and the CPU combined

Response time problems are created when an application that for each

request requires too many separate data requests due to network latency. This is especially true in geographically disbursed architectures.

These capacity limitations should be considered when designing and creating PRPC rules. The PRPC application server component is in the ideal position to not only provide a view of its own load on the application server, but it can also observe the performance of the components in which it communicates.


6/17

2 Designing and Building for Performance

Performance Analysis Checklist When designing a PRPC application, consider these four checklist items to ensure you address performance issues early.

1. Design Time Warnings: Be attentive to PRPC warnings. They indicate when something may not be quite right with one of your rules.

2. Performance Alerts: When unit testing, ensure rules run within the expected SLA time and they do not use system resources above certain pre set thresholds.

3. PAL: Inspect the unit tested run time statistics of PAL to identify indications of abnormal usage.

4. Application Functionality: Ensure there are no behind the scenes errors that may have performance implications.

Identifying Design Time Warnings with Preflight To begin looking for design time warnings, start with the PreFlight tool, which shows the number of violations of PRPC best practices.

From a performance perspective, pay particular attention to warnings on rule types related to:

Class

List View Activity

To edit a rule, expand each Rule Type and click on the rule you wish to edit.

Design Time Warnings When Creating Classes A significant design time warning pertains to how the PRPC concrete classes that persist data are mapped to the underlying database tables.

A significant advantage of PRPC is the ability to create new concrete classes, add properties, and never involve the DBA in the day to day iterative development. This allows the developer to focus on rules development, rapidly create the application, and test it for functional completeness without having to involve the DBA at each step to maintain a schema that matches the application.


7/17

Designing and Building for Performance 3

PRPCs default mapping of concrete classes is to one of four default (and very generic) tables:

PR_OTHER

PR_DATA PR_HISTORY

PR_INDEX

These tables are bare bones and do not have exposed columns nor indexes to support your queries.

The class that holds the Home Codes data shows a warning. Rules can save as Valid; however, it may contain a warning. Look to the bottom of the rule for the associated warning messages.

When new classes are created, by default they are mapped to PR_OTHER. When the application is deployed, ideally no classes should be mapped to any of the four default tables.

You can remap concrete classes to database tables quickly and easily

Using the class explorer, navigate to the Data Admin DBTable instances, which show the mapping of PRPC classes to database table names.

This mapping takes into account a form of pattern inheritance. For example, a class named Data Admin XXX would map to pr_data_admin, yet a class named

Data MyRules XXXX would map to PR_DATA. If there are no matching mapping rows found, the mapping is assumed to be PR_OTHER.

To fix this problem, create a new instance row, such as UServ Data HomeCodes, and map it to a new database table, such as HomeCodes.

Have the DBA model this new table from PR_OTHER. Use PC_WORK if it was a work object table.

The developer can also create indexes if filtering is needed.

After the class is mapped and the DBA has created the new HomeCodes table, test that the connection to the new table is working properly directly from the class rule.


8/17


List View Unit Testing Considerations ListView rules are used extensively in most PRPC applications for section displays and reporting. Often times, the output from the List View is functionally correct, and when run in the development environment, it runs in

sub second response time.

But will it continue to perform well in a production environment? Dont be fooledthis very simple listing could become a very serious production issue later potentially due to the following factors:

Test data returned is small: The amount of data returned in the test case is smaller than what will likely occur in production

Tables contain only a few rows: The size of the underlying tables for your test data is many times smaller than the production database, and inefficient access is masked by these very small data sets

Missing indexes not apparent: The database index(s) necessary for efficient query performance may not exist, but again, due to the small datasets, we dont see the negative effect of that in development.

Unexposed data: A more subtle problem may be that data being reported on is buried deep within the special database column pzPvStream, and the resources required to parse out the data is extensive.

Troubleshooting List View Report Performance Always look to the bottom of the rule for any associated warning messages.

PRPC has detected two properties, Code and BaseCost, which are not explicitly defined as database columns.

The ListView report is running a database query to select the desired information. However, all the information needed in this case is not stored in regular database columns.

Why is this a potential problem? When retrieving data, you must ensure the underlying table accessed by the ListView has column names that exactly match all the referenced property names of this ListView. If the request were to refer to only properties that map to database column, then the pzPvStream would not be needed. However, one referenced property not exposed as a database column can cause the generated Select statement to read the pzPvStream column. This includes those properties used for filtering and sorting as well.

You can use PRPCs Database tool for viewing and modifying your database schema.


9/17


Creating database columns for properties, a process called exposing a property, does not require long discussions with a DBA; rather, simply give the DBA the generated SQL.

To retain certain rows in the table, run the resaver servlet to populate the

newly created columns. (See the system administration guides for details.)

The more data, the longer it takes to run this resaver tool. Exposing the required properties during design time can avoid this expensive operational step.

Design Time Warnings for Java Activity Method Java code warrants a warning because the code itself may be poor performing, difficult to maintain, or simply used for debugging, as it is in this case. This warning pertains to guardrail #4 Limit Hand Coded Java.

Often times, programmers who are very familiar with Java tend to resort to Java steps instead of using the preferred, and easier to maintain, PRPC equivalent of out of the box methods that do the same thing.

Verbose logging may seem innocuous, but it can have a noticeable impact on performance for a few reasons:

Using println() in this case synchronously attempts to write to sysout, which can create a bottleneck.

Using the PRPC oLog() method is a better approach that uses the Log4J facility instead, which is asynchronous and more efficient.

Also, note how the string being written is made up of a concatenated set of sub strings and tokens. This process can also throw off garbage in the JVM as the string concatenation can be inefficient from a memory management perspective. The token generated from the call to getXML() can be quite large depending on the size of that object and can also be memory intensive.

Instead of reverting to Java, a better practice is to use the PRPC OOTB

method LogMessage, which accomplishes the same goal without resorting to more complex Java coding syntax.

As a debugging aid, unconditional logging can be useful, but is not production worthy code.


10/17


Design Time Warnings for WriteNow and Obj Save PRPCs default behavior for performing ObjSave is to defer the actual UPDATE/INSERT SQL command until a Commit is issued. The benefits of this deferral are the ability to perform multiple, back to back saves on the very same

object instance and PRPC will combine these updates into a single cumulative update at the time of commit. At the time of commit, all the INSERTS, UPDATES, and DELETES are then issued at once. This maximizes concurrency within the application.

Specifying WriteNow defeats all the benefits of the deferred save feature.

There are legitimate reasons to performing a WriteNow, such as needing to immediately re read this instance before issuing a commit. Use of this method can be dangerous if proper locking and error checking is not done. This is not common practice, however, which is why these are called Warnings and not

Errors.

Using the MyAlerts Tool to Unit Test Rule Performance

While PRPCs design time warnings are very powerful and very instructive, it is not possible to detect all performance related issues by simply looking at the structure of the rules and the database. It is important to determine if the rules, when run, actually behave as expected.

PRPC has a built in monitor, called alerts, that look for certain performance related behaviors of components and checks them against pre set thresholds to see if they are performing outside of the norm.

When unit testing rules, particularly activities and flows, check if any of these alerts are being generated.

To check for alerts, select the MyAlerts selection from the Tools menu. The alerts generated for your session will be listed.

The Alert Type columns correspond to Pega alert codes. For example, the Browser Interaction Alert Type corresponds to Pega alert code PEGA0001.

The Value column provides a measure of the elapsed time in seconds that triggered the alert. If there are several alerts it is best to first focus on those that are most severe and most prevalent.


11/17


There are over 20 different types of alerts, with the most common four alerts being:

1. DB Time (PEGA0007)

2. Brower Interaction (PEGA0001)

3. DB Bytes Read (PEGA0004)

4. Connect Total Time (PEGA0020)

Optimizing Query Performance with DB Time Alert DB Time (PEGA0007) is triggered when a call to the PegaRULES database exceeds a certain threshold (typically 500ms).

Queries that take a long time to run, especially in development, will likely run even longer in a production environment when hundreds of users are accessing the database at the same time.

There are many root causes for why a query might take too long to run.

Indexing a table: There may be a need to add an index to a table to make the query plan more efficient

SQL syntax: The syntax of the generated SQL may be too complex for the DBMS to generate an efficient query access plan

Data filtering: The filtering of data may not be fine enough and you are

trying to return too many rows Record locking: There may be contention on the database server due to

record locking

Database server capacity: The capacity of the database server may be impacted by other, unrelated processing such as backups, reports, or reorganizations

Data returns: You are returning too much data, such as very large BLOBs, via the pzPvStream column

Use this view to see the exact query syntax, as well as any bind variables to the SQL, and determine the exact data this query is trying to retrieve. When it is difficult to determine the root cause visually:

Rerun this SQL in an interactive tool, such as SQL*PLUS, TOAD, DBArtisan or Enterprise Manager, to recreate the execution event, or

Provide this SQL to a DBA for more detailed analysis of the query plan generated


12/17


To put this alert into perspective, the MyAlerts tool provides the Last Step and Trace List, which links to the exact rules and sequence of steps that led up to this alert event. Often times, fixing a poorly running query is done by modifying the rules that led up to and were used to generate the SQL.

Optimizing Performance with Browser Interaction Alert

The Brower Interaction (PEGA0001) alert compares the PRPC response time for the browser request against a system wide threshold of typically 1000ms (or 1 second). With the goal of each user interaction being sub second, these alerts indicate a slow running process that needs to be decomposed to understand where and how to reduce the overall run time.

Optimizing Performance with DB Bytes Read Alert The DB Bytes Read (PEGA0004) alert (aka the byte governor) is triggered whenever an interaction with PRPC causes more than 50mb of data, which is a large amount of data for a single request, to be read from the PegaRULES database. Typically, these happen when working with lists of data via ListView, SummaryView, and ObjList rules.

When this alert is generated, it is essential to locate the rules that cause this much data to be read and limit them by adding more criteria, capping the rows

returned, and/or reducing the number of data columns returned. This alert is often triggered when a rule requires reading of the BLOB or pzPvStream, which can also be quite large.

Since the 50mb threshold is high, consider asking the PRPC administrator to reduce this threshold to 20mb or less.

Also, there is an option to terminate the PRPC task when this threshold is encountered. It is highly recommended that this is enabled to prevent runaway, unbounded queries from consuming all available memory and crashing the system.


13/17


Optimizing Performance with Connect Total Time Alert

The Connect Total Time (PEGA0020) alert is triggered when a connector method call to an external system exceeds the system threshold of 1000ms. Often times, a single request to PRPC may require multiple calls to connectors to external systems, and if these perform poorly, PRPC users are likely to experience long response times.

Typically, the bottleneck is not due to PRPC but to the other application being accessed. The bottleneck can occur if the data being sent to or received from this connector is large and complex.

Fixing these alerts often times require restructuring the design by:

Fixing the third party application to process the requests more efficiently

Reducing the payloads

Reducing the complexity of the data

Consolidating some of the connector calls to avoid network latency

Using more efficient connector types over others for instance, MQ is more efficient than SOAP

This can take quite a bit of time to accomplish, so the earlier these issues are detected the better.

Uncovering Hidden Performance Issues with PAL Hidden performance issues might include run times or resources used are not high enough to trigger an alert, but they are still high enough to indicate that more system resources are being used than expected.

We will address these issues using the Performance Analyzer (PAL).

Before doing anything, click the Reset Data link twice to initialize the PAL readings to zero. After each screen, click the Add Reading link to obtain a DELTA reading for each. The PAL readings for each screen in the flow provide a story. The summary view enables you to quickly identify spots where significant performance spikes occur in a process.

Note: Reset and re do the readings if significant values for the RA columns (representing rules assembly) are shown. Rules assembly skews the results significantly, and PAL should be analyzed only after rules assembly has occurred.


14/17


If rules assembly persists, work with the system administrator to determine if there is a configuration issue with PRPC.

Understanding PAL Int Count Values

The Int Count value shows how many times the browser communicated with the PRPC for this request. Typical screens take from 1 3 interactions. Complex screens with a lot of dynamic content may show this number much higher. The more interactions per request, the more overall network latency will build up. This view shows how much total time it took screen to screen and how the time was spent (i.e., which categories it falls into).

Understanding the PAL CPU Time Values

CPU time can indicate if the process is a computationally intensive one. For example, a screen that used 0.18 seconds of CPU would mean that in a single user development environment that the Total Elapsed response time would

be sub seconds. However, remember this one screen is using 1/5th

of the total CPU on the machine. In a production system this does not leave much capacity for more than 4 or 5 more concurrent users. When comparing the Total CPU time to the Total Elapsed time, the developer can understand how CPU bound that request was. If the Total Elapsed is 0.20 seconds and Total CPU is 0.18 seconds, then that request was almost all CPU.

CPU numbers are not available for UNIX based systems, which is why it is important for the installation to have one Windows based machine available for PAL analysis during development and testing.

Understanding PAL

Rule

Count

Values

Rule Counts provide insight into the number of rules being executed between readings. Are they in line with the number of rules you have coded or are they 10x, 100x or more? Very high Activity counts can be very useful in identifying poor performing looping conditions in the process or excessive procedural execution. High Connect counts give you an idea as to how chatty your logic is with the back end systems. Five or 10+ connector calls per request could add up quickly to long overall response times.

Understanding PAL Total Byte Values

The Total

Bytes

counter

indicates

how

much

data

is

moving

between

the

client

and the server. This counter helps determine if there are very heavy screens or if large amounts of data are being sent to the server. Numbers of 100k or more certainly are worth looking into as they will surely strain the network and the browser machine to process all that HTML and data.


15/17


Using PAL to Determine Clipboard Size

By clicking the ADD READING WITH CLIPBOARD SIZE to produce a statistic on the detail view showing how big the clipboard is as of this reading. Clipboard sizes that exceed 10k are an indication that the application may be requesting too much data or is not removing pages soon enough in the process. Remember, memory is one of those precious resources that needs to be used judiciously.

The WITH CLIPBOARD SIZE option takes extra resources to produce these statistics, so you should not rely on the CPU and Elapsed Time readings when using this option.

Understanding the PAL Detail Screen

Click the link under one of the DELTA readings that requires further understanding, possibly due to a long Total Elapsed or a high Total bytes, to see over 100 different PAL statistics presented.

Do not be intimidated by the list of numbers, as there are a few here that are of particular interest, while the other more esoteric values can be researched if you happen to require it at a later time.

In the Database Access Counts section, search for any that happen to have the term Storage Stream in the label. These pertain to measurements related to reading or writing the BLOB or pzPvStream to/from the database. These can be particularly resource intensive. Looking at the example, note the following readings:

The Bytes read from database Storage Stream has over 4mb, which is too high and requires adjusting.

101 rows were returned from ObjList that required all of the Storage Stream, which indicates that an ObjList method needs to be fixed.

In the Requestor Summary section, the Number of Output Bytes sent from the server as over 3MB. The next item down is the requestor clipboard size of 3MB. This is a huge number and significantly limits how many concurrent users can be using this system in production.

Scanning over all the PAL detailed statistics, any value that seems to be particularly high may be worth inspecting in more detail.


16/17


Using Alert and Error Logs to Enhance Performance Before deciding that an application is ready to promote to the next level, be it UAT, Pre Production or Production, check the PegaRULES Alert log file and the PegaRULES Error log file.

The ALERT log file should be empty if MyAlerts has been checked along the way. If not, determine which ALERTS are continuing to be presented and why.

More importantly, check the Pega error log file since many errors can occur that do not significantly affect the user experience at least that is what it may seem when unit testing. A failed call to a connector, a failed rule execution, a null pointer exception, or a missing or inconsistent rule can have a significant impact on the quality of the application.

Aside from being a PRPC best practice, it is important to ensure there are no errors in the log file from a performance perspective because:

Logging the errors and producing stack traces is resource intensive

It is difficult to see performance related messages when the log file is riddled with other errors

Some errors may have the side effect of slowing down the system

Blob Data Retrieval and Performance Implications The full PRPC object image is stored in the database table within the pzPVStream column, defined as a BLOB datatype. BLOBs are very efficient and flexible mechanisms for object storage because:

They are compressed by PRPC for low storage overhead.

They can hold any amount of information since there are no physical limit constraints, such as maximum column lengths or page sizes.

The object structure can be very complex, holding many levels of nested structures and repeating groups of information.

The object structure can change from instance to instance without the

involvement of the DBA in making complex changes to the database schema.


17/17


The following is an example of what occurs when requesting an object that requires reading the stream.

1. The DBMS executes a Select to retrieve the object; a typical BLOB for a work object is 20,000 bytes or more in size.

2. During this operation, three extra I/O operations are performed to read the pzPvStream as this 20k of data is stored in 8k chunks on their own data pages.

3. This 20k of data is transmitted as a stream across the network.

4. The JVM allocates a large chunk of memory to receive this data. The stream is uncompressed using CPU resources and requires even more memory to hold the uncompressed data. This stream is parsed to find the data for the two referenced properties.

5. The two requested clipboard property instances are created, consuming more CPU.

6. Finally, the temporary memory is discarded, and the JVM soon invokes the garbage collector and takes more CPU to reclaim this 40k.

When reading just one instance of an object, such as when opening a work object, this overhead is hardly noticed. When using a List View, this process happens for each row in the list. The typical cap on List View reports is 500 rows, which means it is easy to generate (500 x 40,000 or 20MB of garbage). Multiply this by 100 users and 2 GB of garbage is generated!

This is one reason why memory allocation in production systems becomes excessive and how garbage collection within the JVM becomes strained. High CPU usage and high network utilization also affect user response time. Fortunately, by exposing columns, this issue can be easily remedied.

Documents

Design-Build for Performance E-Learning Course Transcript