08 April, 2009

The trouble with DBMS_XPLAN...

As some of you may know, I used to be the Product Development Director for Hotsos. Part of my job entailed work the Laredo product, which helps identify potentials performance problem areas when making changes to your database structure, indexes, statistics, init parameters, etc.

Recently I was called back in on a particularly sticky problem that one of their customers was having in using Laredo. Without going into a long and arduous explanation, the problem boiled down to the fact that Laredo was using DBMS_XPLAN to emit formatted explain plans to the user and they were coming out wrong.

I suppose you would have expected me to learn my lesson by now, but I kind of expect the Oracle documentation to be correct for the most part. So when we looked up DBMS_XPLAN.DISPLAY in the PL/SQL Packages and Types Reference for 10g, I kind of expected to be able to take the code example and use it verbatim.

Here is what it said:

DISPLAY Function

This table function displays the contents of the plan table.

In addition, you can use this table function to display any plan (with or without statistics) stored in a table as long as the columns of this table are named the same as columns of the plan table (or V$SQL_PLAN_STATISTICS_ALL if statistics are included). You can apply a predicate on the specified table to select rows of the plan to display.

Syntax

DBMS_XPLAN.DISPLAY(
table_name IN VARCHAR2 DEFAULT ‘PLAN_TABLE’,
statement_id IN VARCHAR2 DEFAULT NULL,
format IN VARCHAR2 DEFAULT ‘TYPICAL’,
filter_preds IN VARCHAR2 DEFAULT NULL);

statement_id

Specifies the statement_id of the plan to be displayed. This parameter defaults to NULL, which is the default when the EXPLAIN PLAN command is executed without a set statement_id clause.If no statement_id is specified, the function will show you the plan of the most recent explained statement.

And even came with an example that showed how to use it:

To display the plan for a statement identified by ‘foo’, such as statement_id=’foo’:

SELECT * FROM table (DBMS_XPLAN.DISPLAY(‘plan_table’, ‘foo’));

So, when we coded a statement that went into the product that looked like the following, I would have expected to be shown the explain plan output for the correct statement.

select * from table(DBMS_XPLAN.DISPLAY(‘LAR_SCENARIO_PLAN_VW’,’82-2476-34’, ‘BASIC’))

But instead what we got was THIS:

200904071835.jpg

Notice that we get two “0” IDs and two “1” IDs. And they are OBVIOUSLY from two different statements! WHAT THE….

Now I know I specifically told it to get me only the the plan where “82-2476-34” was that STATEMENT_ID. Did I get the ordinal position of the parameters wrong? I tried again with the following statement:

select *
from table(DBMS_XPLAN.DISPLAY(TABLE_NAME =>’LAR_SCENARIO_PLAN_VW’,
STATEMENT_ID =>;’82-2476-34’,
FORMAT=>;’BASIC’))

Unfortunately, I received the same result. SO, I decided to go digging into the underlying plan table to see what was going on. I decided to select all rows where the STATEMENT_ID was equal to ‘82-2476-34’ to see if there was a duplicate STATEMENT_ID in the table. No luck.

200904071920.jpg

Hmm... That doesn't make any sense! There has to be something going on that makes DBMS_XPLAN bring back extraneous rows. My search then turned to the extraneous rows to try to find what they had in common with the rows that should have been coming back. So I started looking at the expanded output of DBMS_XPLAN and trudged through the base table examining the individual attributes to see where DMBS_XPLAN might have been losing it's mind.

The search finally lead me to PLAN_ID. If I selected all the records from the base table where the PLAN_ID = 7191, the same PLAN_ID as my original statement, here is what I got.

200904071926.jpg

EUREKA! But WHY would DBMS_XPLAN bring back other statement ID's when I specifically passed in statement id "82-2476-34”? Luckily DBMS_XPLAN is not wrapped so I was able to use SQL-DEVELOPER's debug feature to walk through the execution of DMBS_XPLAN and grab the explain plan SQL.

SELECT
/*+ opt_param('parallel_execution_enabled','false') */
/* EXEC_FROM_DBMS_XPLAN */
id,
position,
depth,
operation,
OPTIONS,
object_name,
cardinality,
bytes,
temp_space,
cost,
io_cost,
cpu_cost,
TIME,
partition_start,
partition_stop,
object_node,
other_tag,
distribution,
projection,
access_predicates,
filter_predicates,
NULL,
qblock_name,
object_alias,
NVL( other_xml, remarks ) other_xml,
NULL sql_profile,
NULL sql_plan_baseline,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL,
NULL
FROM
LAREDO.LAR_SCENARIO_PLAN_VW
WHERE
plan_id =
(SELECT
MAX( plan_id )
FROM
LAREDO.LAR_SCENARIO_PLAN_VW
WHERE id =0
AND statement_id = :stmt_id)

ORDER BY id

The problem shows up very clearly in the generated SQL. The WHERE clause is taking the STATEMENT_ID I passed and getting the MAX(PLAN_ID) and selecting ALL records that have that PLAN_ID. But what if, as in my case, there are multiple statements with the same plan ID?

OK. Well, since DBMS_XPLAN provides the ability to add filter predicates, I'll try call the procedure as follows:

select * from table(DBMS_XPLAN.DISPLAY(‘LAR_SCENARIO_PLAN_VW’,’82-2476-34’, ‘BASIC’, 'STATEMENT_ID= ''82-2476-34'''))

Adding the filter predicate changed the where clause, but not as I thought it might.

...

FROM
LAREDO.LAR_SCENARIO_PLAN_VW
WHERE
1=1
AND statement_id = :stmt_id
AND STATEMENT_ID = '82-2476-34'
ORDER BY id

WHAT THE %*@#? ...

Ok... I walked through the code again a little more carefully and came to the conclusion that this is just a run of the mill code error. Oracle are building pieces of the where clause and pasting them together based on certain criteria, mixing and matching based on what your PLAN_TABLE looks like. When it comes to the part where they paste in the STATEMENT_ID that you pass in, if you haven't passed anything to FILTER_PRED, they forget to paste the predicates limiting the selection to your chosen STATEMENT ID in all the necessary places.

While the resulting WHERE CLAUSE you get by actually passing something in to FILTER_PRED is not technically correct, it does work.

I hope this might save some people some time, if they run into something similar. In the mean time I'll be looking into METALINK to see it the bug is logged, and if not I'll log it.

Happy Coding!