9
Developing for the Internet Platform Paper #407 / Page 1 Steven Feuerstein, HA-LO Industries / RevealNet Why Tune PL/SQL Code? You can spend all day tuning your SQL statements. You can apply any one of a number of fantastic first-party (Oracle) and third-party analysis tools to the SQL side of your application. You can bring in an Oracle tuning specialist. Yes, you can tune and tune and tune -- and still your PL/SQL-based programs can run very inefficiently. It is difficult to find detailed, comprehensive recommendations for tuning the PL/SQL side of one's Oracle applications. After spending 12 years learning, using and writing about Oracle's PL/SQL language, I have assembled 1,247 tuning tips for PL/SQL developers. I have even selected the top 200 most terrific of these tips for this presentation -- yeah, right! OK, to tell the truth, I do not have 1,247 tuning tips for PL/SQL developers. I suppose it is possible that I have 200 tuning tips. But could I possibly offer them to you in the 6-10 pages allowed in the ECO/SEOUC format? Could I possibly present all these tips in the span of time allowed for the talks at Oracle conferences? NO WAY! So I hope that you are not mad at me. And I will do my best to not let you down by using this paper to focus on a time-honored tuning principle: Avoid unnecessary code execution! This would seem to be rather obvious advice. Surely, if the code is not necessary I would not include it in my program. And just as surely, after I have written my code I can easily identify extraneous lines when I do my code review and remove them. Sure. And under the pressure of the moment, I have never, not possibly ever, found myself taking short-cuts, doing things that I know are not right, because I just don't have the time to pay the necessary attention to write optimized code. And my manager understands the value of code review, of code walkthroughs, so we always have time for that very critical step for the improvement of code quality. The reality is that we usually rush headlong into our coding tasks, feeling just a bit panicky and overwhelmed, annoyed that management won't buy us good tools and, fundamentally, on our own to get the job done and done well. And that is why, contrary to the laws of logic, it is usually all too easy to identify code that really does not need to be executed in our programs. Removing such code will usually improve performance, sometimes in a dramatic fashion. This paper offers advice on how to identify and avoid extraneous code execution, specifically within the context of the Oracle PL/SQL language. The Search for Unnecessary Code I must, first of all, put this tuning exercise in context: I do not suggest that you read this article and then scour every line of code in your application, in search of potential performance gains. You might find lots of changes to make -- and when you are done making those changes, you might also discover that the response time of your has not improved enough for your users to even notice. Don't take a theoretical approach to your tuning efforts (the theory being that every line of code in my application should be perfect in every way: performance, readability, etc.). Your time will be much better and then focusing very tightly on those programs. Once you have selected the area in your application that needs tuning, follows this advice to more rapidly find the problems: Top 200 Terrific PL/SQL Tuning Tips

Top 200 Terrific PL/SQL Tuning Tips - Drexel CCI€¦ · ... comprehensive recommendations for tuning the PL/SQL side of one's Oracle applications. ... using and writing about Oracle's

  • Upload
    vandien

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Developing for the Internet Platform

Paper #407 / Page 1

Steven Feuerstein, HA-LO Industries / RevealNet

Why Tune PL/SQL Code? You can spend all day tuning your SQL statements. You can apply any one of a number of fantastic first-party (Oracle) and third-party analysis tools to the SQL side of your application. You can bring in an Oracle tuning specialist. Yes, you can tune and tune and tune -- and still your PL/SQL-based programs can run very inefficiently. It is difficult to find detailed, comprehensive recommendations for tuning the PL/SQL side of one's Oracle applications. After spending 12 years learning, using and writing about Oracle's PL/SQL language, I have assembled 1,247 tuning tips for PL/SQL developers. I have even selected the top 200 most terrific of these tips for this presentation -- yeah, right! OK, to tell the truth, I do not have 1,247 tuning tips for PL/SQL developers. I suppose it is possible that I have 200 tuning tips. But could I possibly offer them to you in the 6-10 pages allowed in the ECO/SEOUC format? Could I possibly present all these tips in the span of time allowed for the talks at Oracle conferences? NO WAY! So I hope that you are not mad at me. And I will do my best to not let you down by using this paper to focus on a time-honored tuning principle: Avoid unnecessary code execution! This would seem to be rather obvious advice. Surely, if the code is not necessary I would not include it in my program. And just as surely, after I have written my code I can easily identify extraneous lines when I do my code review and remove them. Sure. And under the pressure of the moment, I have never, not possibly ever, found myself taking short-cuts, doing things that I know are not right, because I just don't have the time to pay the necessary attention to write optimized code. And my manager understands the value of code review, of code walkthroughs, so we always have time for that very critical step for the improvement of code quality. The reality is that we usually rush headlong into our coding tasks, feeling just a bit panicky and overwhelmed, annoyed that management won't buy us good tools and, fundamentally, on our own to get the job done and done well. And that is why, contrary to the laws of logic, it is usually all too easy to identify code that really does not need to be executed in our programs. Removing such code will usually improve performance, sometimes in a dramatic fashion. This paper offers advice on how to identify and avoid extraneous code execution, specifically within the context of the Oracle PL/SQL language.

The Search for Unnecessary Code I must, first of all, put this tuning exercise in context: I do not suggest that you read this article and then scour every line of code in your application, in search of potential performance gains. You might find lots of changes to make -- and when you are done making those changes, you might also discover that the response time of your has not improved enough for your users to even notice. Don't take a theoretical approach to your tuning efforts (the theory being that every line of code in my application should be perfect in every way: performance, readability, etc.). Your time will be much better and then focusing very tightly on those programs. Once you have selected the area in your application that needs tuning, follows this advice to more rapidly find the problems:

Top 200 Terrific PL/SQL Tuning Tips

Developing for the Internet Platform

Paper #407 / Page 2

Check your loops: code within a loop (FOR, WHILE and simple) executes more than once (usually). Any inefficiency in a loop's scope has, therefore, a potentially multiplying effect. Check your SQL statements: first of all, you should of course make sure that your SQL statements have been optimized. That topic is outside the scope of this article; there are many fine tools and books that will help you tune your SQL. There are situations, however, when pure SQL results in excessive execution -- and when the judicious use of PL/SQL can improve that statement's performance. Review heavily patched sections of code: In any complex program that has a lifespan of more than six months, you will usually find a section that has been changed again and again and again. It is very easy to allow inefficiencies to slip in with such incremental changes. Don't take the declaration section for granted: Sure, that's the place where you declare all variables, give them their initial values, etc. It is quite possible, however, that some actions taken in that section (the declarations themselves or the defaulting code) is not always needed and should not always be run on startup of the block. I explore these and other topics in the remainder of this article.

Check Your Loops Code within a loop (FOR, WHILE and simple) executes more than once (usually). Any inefficiency in a loop's scope has, therefore, a potentially multiplying effect. In one tuning exercise for a client, I discovered a thirty line function that ran in less than half a second, but was executed so frequently that its total elapsed time for a run was five hours. Focused tuning on that one program reduced its total execution time to less than twenty minutes. Always go to your loops first and make sure you are not introducing such a problem. Here is an obvious example; my procedure accepts a single name argument and then for reach record fetched from a packaged cursor, it processes the record. PROCEDURE process_data (nm_in IN VARCHAR2) IS BEGIN FOR rec IN pkgd.cur LOOP process_rec ( UPPER (nm_in), rec.total_production); END LOOP; END;

The problem with this code is that I apply the UPPER function to the nm_in argument for every iteration of the loop. That is unnecessary, since the value of nm_in never changes. I can easily fix this code by declaring a local variable to store the upper-cased version of the name: PROCEDURE process_data (nm_in IN VARCHAR2) IS v_nm some_table.some_column%TYPE := UPPER (nm_in); BEGIN FOR rec IN pkgd.cur LOOP process_rec ( v_nm, rec.total_production); END LOOP; END;

It is not always so easy, of course, to spot redundant code execution. In this example, one would assume that I upper-cased the name because either (a) I knew for certain that process_rec would not work properly with a lower or mixed-case string, or (b) I am not really sure how process_rec works and so "take out insurance" to head off any possible problems. If I have found the process_data procedure to be a bottleneck, it is very important that I understand how all of the code on which it depends works. An incorrect assumption can intersect in very nasty ways with the algorithms of underlying programs. It may well be the case, for example, that process_rec always performs an upper-case conversion on its first parameter. That would make my UPPER unnecessary and its UPPER excessive.

Developing for the Internet Platform

Paper #407 / Page 3

In this situation, it might well be necessary to ask the author of process_rec to remove the UPPER action, or make it optional by providing a third argument, perhaps something like this: PROCEDURE process_rec ( name_in IN VARCHAR2, prod_in IN NUMBER, uc_name_in IN BOOLEAN := TRUE)

and then I can pass FALSE for the third argument to bypass the UPPER. Here is block of code with a loop. See if you can identify areas for improvement: DECLARE CURSOR emp_cur IS SELECT last_name, TO_CHAR (SYSDATE, 'MM/DD/YYYY') today FROM employee; BEGIN FOR rec IN emp_cur LOOP IF LENGTH (rec.last_name) > 20 THEN rec.last_name := SUBSTR (rec.last_name, 20); END IF; process_employee_history ( rec.last_name, rec.today, USER); END LOOP; END;

There are at least three different examples of unnecessary code execution here: I call SYSDATE repetitively within the SELECT statement, yet I am selecting only the date and not the time. Assuming that this block of code always starts and finishes within the same day (something one should be able to determine without too much trouble), I can move this computation out of the query and into a local variable. Not only will I avoid multiple formatting calls, but I also perform just one SELECT FROM dual, which is how SYSDATE is implemented in PL/SQL. Within the loop, I make sure that I do not pass last names with more than 20 characters. An alternative approach is to perform the SUBSTR within the query and do not perform a LENGTH check at all. If many of the strings are short (< 20 characters), this change might not actually improve performance. I call the USER function for each iteration in the loop. Does the value returned by USER change during this time? Not at all. It will always give me the same value during a connection to Oracle. So I should call this just once, cache that value in memory (in a local variable or perhaps even better a packaged constant, so it is accessible throughout my session), and avoid many SELECT FROM dual calls. Here is a revised version of my block: DECLARE v_today VARCHAR2(20) := TO_CHAR (SYSDATE, 'MM/DD/YYYY'); v_user VARCHAR2(30) := USER; CURSOR emp2_cur IS SELECT SUBSTR (last_name, 1, 20) last_name FROM employee; BEGIN FOR rec IN emp2_cur LOOP process_employee_history ( rec.last_name, v_today, v_user); END LOOP; END; /

You can compare the performance of these two scripts by running the someplsql.tst script.

Developing for the Internet Platform

Paper #407 / Page 4

Check Your SQL Suppose I have fully tuned my SQL, using one or another explain plan utility, or a third-party analysis and tuning tool -- and I still find the PL/SQL program in which the SQL is executed to be running slowly. The problem may well be that the SQL statement itself has introduced unnecessary or excessive execution -- and in a number of these situations, the judicious use of PL/SQL can improve your performance. Consider the following query: SELECT 'Top employee in ' || department_id || ': ' || E.last_name || ', ' || E.first_name str FROM employee_big E WHERE E.salary = (SELECT MAX (salary) FROM employee E2 WHERE E2.department_id = E.department_id);

I use this query to display those employees who earn the highest salaries in their respective department. I can very easily write a correlated subquery to handle this requirement. The problem with this approach is that the maximum salary for a given department will be recomputed many, many times (assuming a large number of employees). I can rewrite this single query into nested PL/SQL loops that gets the job done much more efficiently: DECLARE CURSOR dept_cur IS SELECT department_id, MAX (salary) max_salary FROM employee_big E GROUP BY department_id; CURSOR emp_cur ( dept IN department.department_id%TYPE, maxsal IN NUMBER) IS SELECT 'Top employee in ' || department_id || ': ' || last_name || ', ' || first_name str FROM employee_big WHERE department_id = dept AND salary = maxsal; BEGIN FOR dept_rec IN dept_cur LOOP FOR rec IN emp_cur ( dept_rec.department_id, dept_rec.max_salary) LOOP str := rec.str; END LOOP; END LOOP; END; /

Run the script found in the useplsql.tst file to compare the performance of these two approaches. [Note: the useplsql.tst script also includes another SQL-only implementation involving an in-line view that matches the performance -- in this case -- of the PL/SQL nested loop. There are generally a number of different ways to write SQL and PL/SQL to obtain the correct answer. Criteria for determining the optimal implementation should involve not only performance, but the accessibility of the code (to enhance, maintain, etc.).]

Defer Execution Till Needed Just because you have a declaration section positioned "before" your executable section does not mean that you should declare all your program's variables there. It is quite possible, that some actions taken in that section (the declarations themselves or the defaulting code) is not always needed and should not always be run on startup of the block.

Developing for the Internet Platform

Paper #407 / Page 5

Consider the following block of code: PROCEDURE always_do_everything ( criteria_in IN BOOLEAN) IS big_string VARCHAR2(32767) := ten_minute_lookup (...); big_list list_types.big_strings_tt := two_minute_number_cruncher (...); BEGIN IF NOT criteria_in THEN use_big_string (big_string); process_big_list (big_list); ELSE /* Nothing big going on here */ ... END IF; END;

I declare a big string, and call a function that takes ten minutes of elapsed time and lots of CPU time to assign the default value to that string. I also declare and populate a collection (via a package-declared table TYPE), again relying on a CPU-intensive function to populate that list. I take both of these steps because I know that I need to use the big_string and big_list data structures in my programs. Then I write my executable section, run some initial tests and everything seems OK, except that it runs too slowly. I decide to walk through my code to get a better understanding of its flow. I discover something very interesting: my program always declares and populate the big_string and big_list structures, but it doesn't use them unless criteria_in is FALSE, which is usually not the case! Once I have this more thorough understanding of my program's logical flow, I can take advantage of nested blocks (an anonymous block defined within another block of code) to defer the penalty of initializing my data structures until I am sure I need them. Here is a reworking of my inefficient program: PROCEDURE only_as_needed ( criteria_in IN BOOLEAN) IS BEGIN IF NOT criteria_in THEN DECLARE big_string VARCHAR2(32767) := ten_minute_lookup (...); big_list list_types.big_strings_tt := two_minute_number_cruncher (...); BEGIN use_big_string (big_string); Process_big_list (big_list); END; ELSE /* Nothing big going on here */ ... END IF; END;

One other advantage of this approach is that when the nested block terminates, the memory associated with its data structures is released. This behavior would come in very handy in the above example if I needed to perform lots more operations in my program after I am done with my "big" variables. The former approach meant that memory would not be released until the entire program was done.

Developing for the Internet Platform

Paper #407 / Page 6

Be a Good Listener Are you a good listener? When people speak do you expend more effort figuring out how you will respond, instead of attempting to truly understand what they mean? Being a good listener is, I believe, a sign of respect for others, and is a skill we should all cultivate (I know that I need to make more of an effort). Being a good listener is also a critical skill when a programmer uncovers requirements from users and translates them into code. All too often, we hear what our users say, but we do not really listen. The consequence is that we end up writing code that either simply does not meet the requirements, or does so in an efficient manner. Consider the following example: CREATE OR REPLACE PROCEDURE remove_dept ( deptno_in IN NUMBER, new_deptno_in IN NUMBER) IS emp_count NUMBER; BEGIN SELECT COUNT(*) INTO emp_count FROM emp WHERE deptno = deptno_in; IF emp_count > 0 THEN UPDATE emp SET deptno = new_deptno_in WHERE deptno = deptno_in; END IF; DELETE FROM dept WHERE deptno = deptno_in; END drop_dept;

This procedure drops a department from the department table , but before it does that, it reassigns any employees in that department to another. The logic of the program is as follows: If I have any employees in that department, perform the update effecting the transfer. Then delete that row from the department table. Can you see what is wrong with this program? Actually, there are two different levels at which this program is objectionable. Most fundamentally, a good part of the code is unnecessary. If an UPDATE statement does not identify any rows to change, it does not raise an error; it simply doesn't do anything. So the remove_dept procedure could be reduced to nothing more than: CREATE OR REPLACE PROCEDURE remove_dept ( deptno_in IN NUMBER, new_deptno_in IN NUMBER) IS emp_count NUMBER; BEGIN UPDATE emp SET deptno = new_deptno_in WHERE deptno = deptno_in; DELETE FROM dept WHERE deptno = deptno_in; END drop_dept;

Suppose, however, that it was necessary to perform the check for existing employees. Let's take a closer look at what really is going on here. The question I need to answer is "Is there at least one employee?", yet if you look closely at my code, the question I really answer is "How many employees do I have?". I can transform the answer to that question into the answer to my first question with a Boolean expression (emp_count > 0), but in the process I may have gone overboard in my processing. There are, in fact, a number of ways to answer the question "Do I have at least one of X?"; the path you take can have a serious impact on performance. Here are some possibilities: Use COUNT(*) as shown above: BEGIN SELECT COUNT(*) INTO emp_count FROM employee WHERE deptno = deptno_in;

Developing for the Internet Platform

Paper #407 / Page 7

atleastone := emp_count > 0;

2. Use an explicit cursor to test for existence by fetching once and checking the ISOPEN cursor attribute: CURSOR count_cur IS SELECT COUNT(*) FROM employee WHERE deptno = deptno_in; rec count_cur%ROWTYPE; BEGIN OPEN count_cur; FETCH count_cur INTO rec; atleastone := count_cur%FOUND;

3. Use an implicit cursor and rely on exception handling to determine the outcome: BEGIN SELECT 'x' INTO dummy FROM employee_big WHERE department_id = &2; atleastone := TRUE; EXCEPTION WHEN NO_DATA_FOUND THEN atleastone := FALSE; WHEN TOO_MANY_ROWS THEN atleastone := TRUE; END;

4. Get fancy with SQL. There are a number of things you can do to reduce the potential excesses of COUNT or otherwise cut down on the work performed by the SQL engine to get its answer. Here are some examples: BEGIN SELECT COUNT(1) INTO dummy FROM employee_big WHERE department_id = &2 AND ROWNUM < 2; atleastone := dummy > 0; BEGIN SELECT 1 INTO dummy FROM dual WHERE EXISTS ( SELECT 'x' FROM employee_big WHERE department_id = &2); atleastone := dummy IS NOT NULL;

Wow! Those are fun to write, I guess. So you can use ROWNUM to cut down on COUNT's work. You can even use the EXISTS operator and a subquery to get the job done. The games SQL developers play! Which of all these techniques yields, however, the most efficient program? You can run the atleastone.sql script to compare each of these different approaches (note: the beginning of the script creates a rather large copy of the employee table; this code is commented out to avoid the overhead of this step when the table is already in place. You will want to uncomment this section the first time you try the script.) Here is output typical for this script (each different approach is executed 1000 times, checking for the existence of at least one employee in department 20): SQL> @atleastone 1000 20 Implicit Elapsed: .84 seconds. Factored: .00084 seconds. Explicit Elapsed: .34 seconds. Factored: .00034 seconds. COUNT Elapsed: 4.22 seconds. Factored: .00422 seconds. COUNT ROWNUM < 2 Elapsed: .27 seconds. Factored: .00027 seconds. EXISTS Elapsed: .36 seconds. Factored: .00036 seconds.

The explicit cursor along with the two "clever" implementations score the best benchmark results. You can also see that COUNT(*) is much slower than any other approach. Given these results, I would generally choose to answer the

Developing for the Internet Platform

Paper #407 / Page 8

question "Is there at least one?" with a straightforward, efficient explicit cursor. Even though the COUNT(1) with ROWNUM < 2 is a bit faster, it is overly complex (as is its EXISTS partner). The incremental improvement in performance does not offset the increased difficulty in maintaining and enhancing such code. While this scenario and the accompanying script focus on a particular requirement, I was only able to come up with a better implementation by making certain that the code I wrote (the "answer") was responsive to the "question", the user requirements.

Understand How Your Tools Work Oracle Corporation has worked over the last several years improving the PL/SQL language. We've seen our software run faster due to low-level tuning. We can do so much more with PL/SQL than ever before, due to language enhancements. Of particular importance is the addition of so many built-in packages to our repertoire. Few developers have, however, enough time to learn all the available, new techniques, and even less time to figure out how best to take advantage of all this great stuff. The code we write in this environment of almost forced ignorance can have significant problems. Use (or misuse) of the DBMS_SQL package offers an example of this dynamic. DBMS_SQL is the only way (prior to Oracle8i) to execute dynamic SQL: SQL statements and PL/SQL blocks that are constructed and executed at runtime. DBMS_SQL is far and away the most complicated to use of the built-in packages. Improper use of the package can lead to serious performance issues. Consider the following block of code: CREATE OR REPLACE PROCEDURE no_justice IS cur INTEGER; rows_updated INTEGER; BEGIN FOR rec IN ( SELECT name, bonus FROM ceo_compensation WHERE layoffs > 1000) LOOP cur := DBMS_SQL.OPEN_CURSOR; DBMS_SQL.PARSE ( cur, 'UPDATE ill_gotten_gains SET compensation = compensation + ' || rec.bonus || 'WHERE slave_to_profits = ''' || rec.name || '''', DBMS_SQL.native); rows_updated := DBMS_SQL.EXECUTE(cur); DBMS_SQL.CLOSE_CURSOR(cur); END LOOP; END;

This program is straightforward enough: add the bonus received by each CEO who has laid off more than 1,000 employees to their total compensation package. Truly ill-gotten gains (and also not an example that requires the use of DBMS_SQL, but please bear with me for the purpose of illustrating the issue). At first glance, the construction of this program seems quite logical: open a dynamic SQL cursor, parse the UPDATE statement with the specific values for that CEO, execute the UPDATE and close the cursor. A simple test plan confirms that the code works just fine. Yet this program actually has two serious flaws, in terms of efficient execution: When you work with DBMS_SQL, you do not need to open and close a cursor for each separate SQL statement that you execute. You can open one cursor and use that cursor for any number of completely different statements, such as a query, followed by a delete, followed by a PL/SQL block.

Developing for the Internet Platform

Paper #407 / Page 9

In the no_justice procedure I use concatenation to place the specific values for a CEO into the string. While this works, it is not nearly as efficient as using bind variables. If I concatenate literal values into my SQL strings, then each SQL string is physically different and must be parsed individually by the SQL engine. If, on the other hand, I use bind variables (and placeholders in the SQL string itself), I can parse just once and do nothing more than bind with each row fetched from the cursor. Here is a rewrite of the no_justice procedure incorporating these changes: CREATE OR REPLACE PROCEDURE no_justice IS cur INTEGER := DBMS_SQL.OPEN_CURSOR; rows_updated INTEGER; BEGIN DBMS_SQL.PARSE ( cur, 'UPDATE ill_gotten_gains SET compensation = compensation + :bloodMoney WHERE slave_to_profits = :theCEO', DBMS_SQL.native); FOR rec IN ( SELECT name, bonus FROM ceo_compensation WHERE layoffs > 1000) LOOP DBMS_SQL.BIND_VARIABLE ( cur, 'bloodMoney', rec.bonus); DBMS_SQL.BIND_VARIABLE ( cur, 'theCEO', rec.name); rows_updated := DBMS_SQL.EXECUTE(cur); END LOOP; DBMS_SQL.CLOSE_CURSOR(cur); END;

Now I only execute code that actually needs to be run at any given point. The result, depending on the complexity of the SQL statement and the overhead of parsing, can be dramatically lower execution times. You can run the effdsql.tst script to compare the performance of a few different approaches to this same query.

Lean, Mean Executing Maintainable Machine That's what our programs should be and do: write and run just the code needed to get the job done -- and write it in a way that makes it easy to enhance over time. This paper offered a number of different examples that will hopefully make it easier for you to visit your own programs and identify areas for improvement. And as you do encounter your own opportunities for improvement and perhaps some other excellent illustrations of "unnecessary code", I encourage you to post your examples (before and after) on the PL/SQL Pipeline's Pipetalk "Debugging, Tuning and Tracing" conference.