DBA Sensation

April 12, 2012

Set Oracle SGA > 256GB

Filed under: [Installation] — Tags: , , — zhefeng @ 2:06 pm

I had a installation request for installing Oracle 11gR2 on a 2TB memory server. The installation failed on DBCA with complains about can’t reach shared memory.

Check the metalink didn’t find any solution. My colleague told me he was having the same issue before. Oracle told him to set SGA less than 256 GB as a “workaround”.

I followed “workaround” and continued my installation. Later I did some research and I found this:

 

Solution

Checking the swap and the kernel parameters, everything was adjusted as per recommended by oracle, investigating the issue further, seems that This is caused by the prelink command. It calculates shared library load addresses, and updates the shared libraries with them. Simplest thing to do is to undo what prelink did, and disable it.
prelink -ua
sed -i ‘s/PRELINKING=yes/PRELINKING=no/’ /etc/sysconfig/prelink

 

From: https://support.oracle.com/CSP/ui/flash.html#tab=KBHome%28page=KBHome&id=%28%29%29,%28page=KBNavigator&id=%28bmDocTitle=Why%20not%20able%20to%20allocate%20a%20more%20SGA%20than%20193G%20on%20Linux%2064?&from=BOOKMARK&bmDocType=HOWTO&bmDocID=1241284.1&viewingMode=1143&bmDocDsrc=KB%29%29

Doc ID: 1241284.1

I haven’t tried it yet. anyone is having the same problem can give a try and let me know.

March 2, 2011

Recreating spfile on ASM storage from pfile

Filed under: [backup and recovery] — Tags: , , , — zhefeng @ 2:46 pm

Sometimes when you strewed up with parameters, you need to use the pfile as stepstone to undo the changes in spfile. How does it happen if your spfile sits on ASM storage? Here is an workaround.

1. try to screw up the db parameters
SQL> show parameter memory

NAME TYPE VALUE
———————————— ———– ——————————
hi_shared_memory_address integer 0
memory_max_target big integer 1520M
memory_target big integer 1520M
shared_memory_address integer 0
SQL> alter system set memory_max_target=0 scope=spfile;
System altered.

2. now bounce the instance, db will complain about the new settings
SQL> shutdown
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORA-01078: failure in processing system parameters
ORA-00837: Specified value of MEMORY_TARGET greater than MEMORY_MAX_TARGET

3. in my case the spfile sits on ASM
ASMCMD> ls -l spfile*
Type Redund Striped Time Sys Name
N spfileorcl.ora => +DATA/ORCL/PARAMETERFILE/spfile.267.744731331

4. what we need to do is creating a pfile from spfile then modify parameter back to valid value, then start db from pfile
1). With db not up, we can create pfile from spfile:
SQL> create pfile from spfile=’+DATA/orcl/spfileorcl.ora’;
2). modify the value in pfile ‘initorcl.ora’
$ vi initorcl.ora
*.memory_max_target=1583349760
3). startup db with pfile
SQL>startup mount –now it will use the pfile

5. create the new spfile to ASM storage from “good” pfile
SQL> create spfile=’+DATA/ORCL/spfileorcl.ora’ from pfile;
File created.

6. watch the file name in ASM storage has been changed, which means we just had a new spfile:
ASMCMD> ls -l spfile*
Type Redund Striped Time Sys Name
N spfileorcl.ora => +DATA/ORCL/PARAMETERFILE/spfile.267.744733351

7. now change the pfile back to be the “bootstrap” of correct spfile
$ cat initorcl.ora
spfile=’+DATA/ORCL/spfileorcl.ora’

8. restart the database, it will pickup the correct spfile again
$ sqlplus / as sysdba
SQL> startup
ORACLE instance started.

Total System Global Area 1586708480 bytes
Fixed Size 2213736 bytes
Variable Size 973080728 bytes
Database Buffers 603979776 bytes
Redo Buffers 7434240 bytes
Database mounted.
Database opened.

SQL> show parameter spfile

NAME TYPE VALUE
———————————— ———– ——————————
spfile string +DATA/orcl/spfileorcl.ora

SQL> show parameter memory

NAME TYPE VALUE
———————————— ———– ——————————
hi_shared_memory_address integer 0
memory_max_target big integer 1520M
memory_target big integer 1520M
shared_memory_address integer 0

September 29, 2010

root.sh failed on 2nd node when installing Grid Infrastructure

Filed under: [RAC] — Tags: , , , — zhefeng @ 12:39 pm

when i was running root.sh for the last step of grid infra installation on second node, it failed (it was success on 1st node):
root.sh failed on second node with following errors
——————————————————-
DiskGroup DATA1 creation failed with the following message:
ORA-15018: diskgroup cannot be created
ORA-15072: command requires at least 1 regular failure groups, discovered only 0

Oracle gives the reason: when you are using multipathing storage for ASM, you have to pre-configure the oracleasm file as below:

On all nodes,

1. Modify the /etc/sysconfig/oracleasm with:

ORACLEASM_SCANORDER=”dm”
ORACLEASM_SCANEXCLUDE=”sd”

2. restart the asmlib by (except 1st node):
# /etc/init.d/oracleasm restart

3. deconfigure the root.sh settings on nodes except 1st node:
$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force

4. Run root.sh again on the 2nd node (or other nodes)

Oracle Metalink Doc:
11GR2 GRID INFRASTRUCTURE INSTALLATION FAILS WHEN RUNNING ROOT.SH ON NODE 2 OF RAC [ID 1059847.1]

September 27, 2010

how to deinstall the failed 11gR2 grid infrastructure

Filed under: [RAC] — Tags: , — zhefeng @ 10:39 am

Two parts are involved: first deconfigure, then deinstall

Deconfigure and Reconfigure of Grid Infrastructure Cluster:

Identify cause of root.sh failure by reviewing logs in $GRID_HOME/cfgtoollogs/crsconfig and $GRID_HOME/log, once cause is identified, deconfigure and reconfigure with steps below – please keep in mind that you will need wait till each step finishes successfully before move to next one:

For Step1 and 2, you can skip node(s) on which you didn’t execute root.sh yet.

Step 1: As root, run “$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force” on all nodes, except the last one.

Step 2: As root, run “$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force -lastnode” on last node. This command will zero out OCR and VD disk also.

Step 3: As root, run $GRID_HOME/root.sh on first node

Step 4: As root, run $GRID_HOME/root.sh on all other node(s), except last one.
Step 5: As root, run $GRID_HOME/root.sh on last node.

Deinstall of Grid Infrastructure Cluster:

Case 1: “root.sh” never ran on this cluster, then as grid user, execute $GRID_HOME/deinstall/deinstall

Case 2: “root.sh” already ran, then follow the step below – please keep in mind that you will need wait till each step finishes successfully before move to next one:

Step 1: As root, run “$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force” on all node, except the last one.

Step 2: As root, run “$GRID_HOME/crs/install/rootcrs.pl -verbose -deconfig -force -lastnode” on last node. This command will zero out OCR and VD disk also.

Step 3: As grid user, run $GRID_HOME/deinstall/deinstall

September 7, 2010

Oracle 10g ASM/RAW storage migration

Filed under: [RAC] — Tags: , , , , , , — zhefeng @ 9:47 am

Objective:
we want to migrate the whole shared storage from old SAN to new SAN without re-installing the whole Oracle RAC

Scenario:
1.Current structure
[Nodes]
## eth1-Public
10.0.0.101 vmrac01 vmrac01.test.com
10.0.0.102 vmrac02 vmrac02.test.com
## eth0-Private
192.168.199.1 vmracprv01 vmracprv01.test.com
192.168.199.2 vmracprv02 vmracprv02.test.com
## VIP
10.0.0.103 vmracvip01 vmracvip01.test.com
10.0.0.104 vmracvip02 vmracvip02.test.com

[Storage]
Both ORACLE_HOME are local:
ORACLE_HOME=/database/oracle/10grac/db
CRS_HOME=/database/oracle/10grac/crs

Shared LUN display (3 partitions, 2*256M for OCR&VOTING, 1*20G for ASM)
Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 32 257008+ 83 Linux
/dev/sdb2 33 64 257040 83 Linux
/dev/sdb3 65 2610 20450745 83 Linux

OCR and Voting are on RAW device: /dev/sdb1 /dev/sdb2

ASM disks
bash-3.1$ export ORACLE_SID=+ASM1
bash-3.1$ asmcmd
ASMCMD> lsdg
State Type Rebal Unbal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Name
MOUNTED EXTERN N N 512 4096 1048576 19971 17925 0 17925 0 DG1/

2. New storage (sdc 10G)
1). new LUN added
[root@vmrac01 bin]# fdisk -l

Disk /dev/sda: 26.8 GB, 26843545600 bytes
255 heads, 63 sectors/track, 3263 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 83 Linux
/dev/sda2 14 535 4192965 82 Linux swap / Solaris
/dev/sda3 536 3263 21912660 83 Linux

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 32 257008+ 83 Linux
/dev/sdb2 33 64 257040 83 Linux
/dev/sdb3 65 2610 20450745 83 Linux

Disk /dev/sdc: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

2). Partition the new LUN to 3 partitions
Disk /dev/sdc: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdc1 1 32 257008+ 83 Linux
/dev/sdc2 33 64 257040 83 Linux
/dev/sdc3 65 1305 9968332+ 83 Linux

3). clone data from previous raw disks
**shutdown db and crs first to make sure there is no change for raw disks!
#dd if=/dev/raw/raw1 of=/dev/sdc1
514017+0 records in
514017+0 records out
263176704 bytes (263 MB) copied, 252.812 seconds, 1.0 MB/s

#dd if=/dev/raw/raw2 of=/dev/sdc2
514080+0 records in
514080+0 records out
263208960 bytes (263 MB) copied, 267.868 seconds, 983 kB/s

4).”cheating” the Oracle by re-binding to new device on both nodes
**old binding
Step1: add entries to /etc/udev/rules.d/60-raw.rules
ACTION==”add”, KERNEL==”sdb1″, RUN+=”/bin/raw /dev/raw/raw1 %N”
ACTION==”add”, KERNEL==”sdb2″, RUN+=”/bin/raw /dev/raw/raw2 %N”

Step2: For the mapping to have immediate effect, run below command
#raw /dev/raw/raw1 /dev/sdb1
#raw /dev/raw/raw2 /dev/sdb2

Step3: Run the following commands and add them the /etc/rc.local file.
#chown oracle:dba /dev/raw/raw1
#chown oracle:dba /dev/raw/raw2
#chmod 660 /dev/raw/raw1
#chmod 660 /dev/raw/raw2
#chown oracle:dba /dev/sdb1
#chown oracle:dba /dev/sdb2
#chmod 660 /dev/sdb1
#chmod 660 /dev/sdb2

**new binding on both node
Step1: editing /etc/udev/rules.d/60-raw.rules
ACTION==”add”, KERNEL==”sdc1″, RUN+=”/bin/raw /dev/raw/raw1 %N”
ACTION==”add”, KERNEL==”sdc2″, RUN+=”/bin/raw /dev/raw/raw2 %N”

Step2: mapping immediately
#raw /dev/raw/raw1 /dev/sdc1
#raw /dev/raw/raw2 /dev/sdc2

Step3:permission and edit /etc/rc.local
#chown oracle:dba /dev/raw/raw1
#chown oracle:dba /dev/raw/raw2
#chmod 660 /dev/raw/raw1
#chmod 660 /dev/raw/raw2
#chown oracle:dba /dev/sdc1
#chown oracle:dba /dev/sdc2
#chmod 660 /dev/sdc1
#chmod 660 /dev/sdc2

5). startup crs and oracle db, check the database, everything works fine after switching the raw disks!

3. ASM disk group migration
1). Mark the new disk sdc3 on one node
# /etc/init.d/oracleasm createdisk VOL2 /dev/sdc3
Marking disk “/dev/sdc3″ as an ASM disk: [ OK ]

2). scan disk on the other node
[root@vanpgvmrac02 bin]# /etc/init.d/oracleasm scandisks
Scanning system for ASM disks: [ OK ]

3). now verify the new disk was marked on both node
[root@vmrac01 disks]# /etc/init.d/oracleasm listdisks
VOL1
VOL2

[root@vmrac02 bin]# /etc/init.d/oracleasm listdisks
VOL1
VOL2

4). add new disk to DISKGROUP (under asm instance)
$export ORACLE_SID=+ASM1
$sqlplus / as sysdba
sql>alter diskgroup DG1 add disk VOL2
–wait rebalancing
sql>select * from v$asm_operation

5). remove old disk from DISKGROUP
sql>alter diskgroup DG1 drop disk VOL1
–wait until rebalancing finished
sql>select * from v$asm_operation
GROUP_NUMBER OPERATION STATE POWER ACTUAL SOFAR
———— ————— ———— ———- ———- ———-
EST_WORK EST_RATE EST_MINUTES
———- ———- ———–
1 REBAL RUN 1 1 2
1374 30 45

6). verify the database and asm, everything is ok!

7). clean-up the old disk confiruations
[root@vmrac01 bin]# /etc/init.d/oracleasm deletedisk VOL1
Removing ASM disk “VOL1″: [ OK ]
[root@vmrac01 bin]# /etc/init.d/oracleasm listdisks
VOL2

[root@vmrac02 ~]# /etc/init.d/oracleasm scandisks
Scanning system for ASM disks: [ OK ]
[root@vmrac02 ~]# /etc/init.d/oracleasm listdisks
VOL2

8). wipe-off the partitions for sdb.

Reference:
1. Exact Steps To Migrate ASM Diskgroups To Another SAN Without Downtime. [ID 837308.1]
2. Previous doc “VMRAC installation” task 130.2008.09.12
3. OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE), including moving from RAW Devices to Block Devices. [ID 428681.1]
4. ASM using ASMLib and Raw Devices

http://www.oracle-base.com/articles/10g/ASMUsingASMLibAndRawDevices.php

June 9, 2010

Sth. about checkpoint

Filed under: 1. Oracle, [System Performance tuning] — Tags: , , — zhefeng @ 2:31 pm

reading a article about checkpoint on metalink(Checkpoint Tuning and Troubleshooting Guide [ID 147468.1])

Here are some good points for checkpoint:

Oracle writes the dirty buffers to disk only on certain conditions:
– A shadow process must scan more than one-quarter of the db_block_buffer
parameter.
– Every three seconds.
– When a checkpoint is produced.

A checkpoint is realized on five types of events:
– At each switch of the redo log files.
– When the delay for LOG_CHECKPOINT_TIMEOUT is reached.
– When the size in bytes corresponding to :
(LOG_CHECKPOINT_INTERVAL* size of IO OS blocks)
is written on the current redo log file.
– Directly by the ALTER SYSTEM SWITCH LOGFILE command.
– Directly with the ALTER SYSTEM CHECKPOINT command.

During a checkpoint the following occurs:
– The database writer (DBWR) writes all modified database
blocks in the buffer cache back to datafiles,
– Checkpoint process (ckpt) updates the headers of all
the datafiles to indicate when the last checkpoint
occurred (SCN)

May 25, 2010

Can’t compile a stored procedure when it’s locked

Filed under: 1. Oracle, [PL/SQL dev&tuning] — Tags: , , — zhefeng @ 10:25 am

Trying to recompile a procedure causes the application to hang
(ie: SQL*Plus hangs after submitting the statement). Eventually ORA-4021 errors
occur after the timeout (usually 5 minutes). Here is the soluation from metalink:
Note:ID 107756.1

Error: ORA 4021
Text: time-out occurred while waiting to lock object
—————————————————————————–
Cause: While trying to lock a library object, a time-out occurred.
Action: Retry the operation later.

Solution Description
——————–

Verify that the package is not locked by another user by selecting from
V$ACCESS view. To do this, run:

SELECT * FROM v$access WHERE object = ”;

Where is the package name (usually in all uppercase). If there is a row
returned, then the package is already locked and cannot be dropped until the
lock is released. Returned from the query above will be the SID that has this
locked. You can then use this to find out which session has obtained the lock.

In some cases, that session might have been killed and will not show up. If
this happens, the lock will not be release immediately. Waiting for PMON to
clean up the lock might take some time. The fastest way to clean up the lock
is to recycle the database instance.

If an ORA-4021 error is not returned and the command continues to hang after
issuing the CREATE OR REPLACE or DROP statment, you will need to do further
analysis see where the hang is occuring. A starting point is to have a
look in v$session_wait, see the referenced NOTE.61552.1 for how to analyze hang
situations in general

Solution Explanation
——————–

Consider the following example:

Session 1:

create or replace procedure lockit(secs in number) as
shuttime date;
begin
shuttime := sysdate + secs/(24*60*60);
while sysdate <= shuttime loop
null;
end loop;
end;
/
show err

begin
– wait 10 minutes
lockit(600);
end;
/

Session 2:
create or replace procedure lockit as
begin
null;
end;
/

Result: hang and eventually (the timeout is 5 minutes):

create or replace procedure lockit as
*
ERROR at line 1:
ORA-04021: timeout occurred while waiting to lock object LOCKIT

Session 3:

connect / as sysdba
col owner for a10
col object for a15
select * from v$access where object = 'LOCKIT';

Result:
SID OWNER OBJECT TYPE
———- ———- ————— ————————
9 OPS$HNAPEL LOCKIT PROCEDURE

select sid, event from v$session_wait;

Result:

SID EVENT
———- —————————————————————-
9 null event

12 library cache pin

In the above result, the blocking sid 9 waits for nothing while session 12, the
hanging session, is waiting for event library cache pin.

March 12, 2010

Why Isn’t Oracle Using My Index?!

Filed under: [System Performance tuning] — Tags: , , , — zhefeng @ 4:02 pm

By Jonathan Lewis

http://www.dbazine.com/oracle/or-articles/jlewis12

The question in the title of this piece is probably the single most frequently occurring question that appears in the Metalink forums and Usenet newsgroups. This article uses a test case that you can rebuild on your own systems to demonstrate the most fundamental issues with how cost-based optimisation works. And at the end of the article, you should be much better equipped to give an answer the next time you hear that dreaded question.

Because of the wide variety of options that are available when installing Oracle, it isn’t usually safe to predict exactly what will happen when someone runs a script that you have dictated to them. But I’m going to risk it, in the hope that your database is a fairly vanilla installation, with the default values for the mostly commonly tweaked parameters. The example has been built and tested on an 8.1.7 database with the db_block_size set to the commonly used value of 8K and the db_file_multiblock_read_count set to the equally commonly used value 8. The results may be a little different under Oracle 9.2

Run the script from Figure 1, which creates a couple of tables, then indexes and analyses them.

create table t1 as
select
trunc((rownum-1)/15) n1,
trunc((rownum-1)/15) n2,
rpad(‘x’, 215) v1
from all_objects<
where rownum <= 3000;

create table t2 as
select
mod(rownum,200) n1,
mod(rownum,200) n2,
rpad('x',215) v1
from all_objects
where rownum <= 3000;

create index t1_i1 on t1(N1);
create index t2_i1 on t2(n1);

analyze table t1 compute
statistics;
analyze table t2 compute
statistics;

Figure 1: The test data sets.

Once you have got this data in place, you might want to convince yourself that the two sets of data are identical — in particular, that the N1 columns in both data sets have values ranging from 0 to 199, with 15 occurrences of each value. You might try the following check:

select n1, count(*)
from t1
group by n1;

and the matching query against T2 to prove the point.

If you then execute the queries:

select * from t1 where n1 = 45;
select * from t2 where n1 = 45;

You will find that each query returns 15 rows. However if you

set autotrace traceonly explain

you will discover that the two queries have different execution paths.

The query against table T1 uses the index, but the query against table T2 does a full tablescan.

So you have two sets of identical data, with dramatically different access paths for the same query.
What Happened to the Index?

Note: if you've ever come across any of those "magic number" guidelines regarding the use of indexes, e.g., "Oracle will use an index for less than 23 percent, 10 percent, 2 percent (pick number at random) of the data," then you may at this stage begin to doubt their validity. In this example, Oracle has used a tablescan for 15 rows out of 3,000, i.e., for just one half of one percent of the data!

To investigate problems like this, there is one very simple ploy that I always try as the first step: Put in some hints to make Oracle do what I think it ought to be doing, and see if that gives me any clues.

In this case, a simple hint:

/*+ index(t2, t2_i1) */

is sufficient to switch Oracle from the full tablescan to the indexed access path. The three paths with costs (abbreviated to C=nnn) are shown in Figure 2:

select * from t1 where n1 = 45;

EXECUTION PLAN
————–
TABLE ACCESS BY INDEX ROWID OF T1 (C=2)
INDEX(RANGE SCAN) OF T1_I1 (C=1)

select * from t2 where n1 = 45;

EXECUTION PLAN
————–
TABLE ACCESS FULL OF T2 (C=15)

select /*+ index(t2 t2_i1) */
*
from t1
where n1 = 45;

EXECUTION PLAN
————–
TABLE ACCESS BY INDEX ROWID OF T2 (C=16)
INDEX(RANGE SCAN) OF T2_I1 (C=1)

Figure 2: The different queries and their costs.

So why hasn't Oracle used the index by default in for the T2 query? Easy — as the execution plan shows, the cost of doing the tablescan is cheaper than the cost of using the index.
Why is the Tablescan Cheaper?

This, of course, is simply begging the question. Why is the cost of the tablescan cheaper than the cost of using the index?

By looking into this question, you uncover the key mechanisms (and critically erroneous assumptions) of the Cost Based Optimiser.

Let's start by examining the indexes by running the query:

select
table_name,
blevel,
avg_data_blocks_per_key,
avg_leaf_blocks_per_key,
clustering_factor
from user_indexes;

The results are given in the table below:
T1 T2
Blevel 1 1
Data block / key 1 15
Leaf block / key 1 1
Clustering factor 96 3000

Note particularly the value for "data blocks per key." This is the number of different blocks in the table that Oracle thinks it will have to visit if you execute a query that contains an equality test on a complete key value for this index.

So where do the costs for our queries come from? As far as Oracle is concerned, if we fire in the key value 45, we get the data from table T1 by hitting one index leaf block and one table block — two blocks, so a cost of two.

If we try the same with table T2, we have to hit one index leaf block and 15 table blocks — a total of 16 blocks, so a cost of 16.

Clearly, according to this viewpoint, the index on table T1 is much more desirable than the index on table T2. This leaves two questions outstanding, though:

Where does the tablescan cost come from, and why are the figures for the avg_data_blocks_per_key so different between the two tables?

The answer to the second question is simple. Look back at the definition of table T1 — it uses the trunc() function to generate the N1 values, dividing the "rownum – 1 "by 15 and truncating.

Trunc(675/15) = 45
Trunc(676/15) = 45

Trunc(689/15) = 45

All the rows with the value 45 do actually appear one after the other in a tight little clump (probably all fitting one data block) in the table.

Table T2 uses the mod() function to generate the N1 values, using modulus 200 on the rownum:

mod(45,200) = 45
mod(245,200) = 45

mod(2845,200) = 45

The rows with the value 45 appear every two hundredth position in the table (probably resulting in no more than one row in every relevant block).

By doing the analyze, Oracle was able to get a perfect description of the data scatter in our table. So the optimiser was able to work out exactly how many blocks Oracle would have to visit to answer our query — and, in simple cases, the number of block visits is the cost of the query.
But Why the Tablescan?

So we see that an indexed access into T2 is more expensive than the same path into T1, but why has Oracle switched to the tablescan?

This brings us to the two simple-minded, and rather inappropriate, assumptions that Oracle makes.

The first is that every block acquisition equates to a physical disk read, and the second is that a multiblock read is just as quick as a single block read.

So what impact do these assumptions have on our experiment?

If you query the user_tables view with the following SQL:

select
table_name,
blocks
from user_tables;

you will find that our two tables each cover 96 blocks.

At the start of the article, I pointed out that the test case was running a version 8 system with the value 8 for the db_file_multiblock_read_count.

Roughly speaking, Oracle has decided that it can read the entire 96 block table in 96/8 = 12 disk read requests.

Since it takes 16 block (= disk read) requests to access the table by index, it is clearer quicker (from Oracle's sadly deluded perspective) to scan the table — after all 12 is less than 16.

Voila! If the data you are targetting is suitably scattered across the table, you get tablescans even for a very small percentage of the data — a problem that can be exaggerated in the case of very big blocks and very small rows.
Correction

In fact, you will have noticed that my calculated number of scan reads was 12, whilst the cost reported in the execution plan was 15. It is a slight simplfication to say that the cost of a tablescan (or an index fast full scan for that matter) is

'number of blocks' /
db_file_multiblock_read_count.

Oracle uses an "adjusted" multi-block read value for the calculation (although it then tries to use the actual requested size when the scan starts to run).

For reference, the following table compares a few of the actual and adjusted values:
Actual Adjusted
4 4.175
8 6.589
16 10.398
32 16.409
64 25.895
128 40.865

As you can see, Oracle makes some attempt to protect you from the error of supplying an unfeasibly large value for this parameter.

There is a minor change in version 9, by the way, where the tablescan cost is further adjusted by adding one to result of the division — which means tablescans in V9 are generally just a little more expensive than in V8, so indexes are just a little more likely to be used.
Adjustments

We have seen that there are two assumptions built into the optimizer that are not very sensible.

* A single block read costs just as much as a multi-block read — (not really likely, particularly when running on file systems without direction)
* A block access will be a physical disk read — (so what is the buffer cache for?)

Since the early days of Oracle 8.1, there have been a couple of parameters that allow us to correct these assumption in a reasonably truthful way.

See Tim Gorman's article for a proper description of these parameters, but briefly:

Optimizer_index_cost_adj takes a value between 1 and 10000 with a default of 100. Effectively, this parameter describes how cheap a single block read is compared to a multiblock read. For example the value 30 (which is often a suitable first guess for an OLTP system) would tell Oracle that a single block read costs 30% of a multiblock read. Oracle would therefore incline towards using indexed access paths for low values of this parameter.

Optimizer_index_caching takes a value between 0 and 100 with a default of 0. This tells Oracle to assume that that percentage of index blocks will be found in the buffer cache. In this case, setting values close to 100 encourages the use of indexes over tablescans.

The really nice thing about both these parameters is that they can be set to "truthful" values.

Set the optimizer_index_caching to something in the region of the "buffer cache hit ratio." (You have to make your own choice about whether this should be the figure derived from the default pool, keep pool, or both).

The optimizer_index_cost_adj is a little more complicated. Check the typical wait times in v$system_event for the events "db file scattered read" (multi block reads) and "db file sequential reads" (single block reads). Divide the latter by the former and multiply by one hundred.
Improvements

Don't forget that the two parameters may need to be adjusted at different times of the day and week to reflect the end-user workload. You can't just derive one pair of figures, and use them for ever.

Happily, in Oracle 9, things have improved. You can now collect system statistics, which are originally included just the four:

+ Average single block read time
+ Average multi block read time
+ Average actual multiblock read
+ Notional usable CPU speed.

Suffice it to say that this feature is worth an article in its own right — but do note that the first three allow Oracle to discover the truth about the cost of multi block reads. And in fact, the CPU speed allows Oracle to work out the CPU cost of unsuitable access mechanisms like reading every single row in a block to find a specific data value and behave accordingly.

When you migrate to version 9, one of the first things you should investigate is the correct use of system statistics. This one feature alone may reduce the amount of time you spend trying to "tune" awkward SQL.

In passing, despite the wonderful effect of system statistics both of the optimizer adjusting parameters still apply — although the exact formula for their use seems to have changed between version 8 and version 9.
Variations on a Theme

Of course, I have picked one very special case — equality on a single column non-unique index, where thare are no nulls in the table — and treated it very simply. (I haven't even mentioned the relevance of the index blevel and clustering_factor yet.) There are numerous different strategies that Oracle uses to work out more general cases.

Consider some of the cases I have conveniently overlooked:

+ Multi-column indexes
+ Part-used multi-column indexes
+ Range scans
+ Unique indexes
+ Non-unique indexes representing unique constraints
+ Index skip scans
+ Index only queries
+ Bitmap indexes
+ Effects of nulls

The list goes on and on. There is no one simple formula that tells you how Oracle works out a cost — there is only a general guideline that gives you the flavour of the approach and a list of different formulae that apply in different cases.

However, the purpose of this article was to make you aware of the general approach and the two assumptions built into the optimiser's strategy. And I hope that this may be enough to take you a long way down the path of understanding the (apparently) strange things that the optimiser has been known to do.

March 11, 2010

How to Troubleshooting Bad Execution Plans

Filed under: [System Performance tuning] — Tags: , — zhefeng @ 11:36 am

Very good sql tuning artical from Greg Rahn

Original Link:

One of the most common performance issues DBAs encounter are bad execution plans. Many try to resolve bad executions plans by setting optimizer related parameters or even hidden underscore parameters. Some even try to decipher a long and complex 10053 trace in hopes to find an answer. While changing parameters or analyzing a 10053 trace might be useful for debugging at some point, I feel there is a much more simple way to start to troubleshoot bad execution plans.

Verify The Query Matches The Business Question

This seems like an obvious thing to do, but I’ve seen numerous cases where the SQL query does not match the business question being asked. Do a quick sanity check verifying things like: join columns, group by, subqueries, etc. The last thing you want to do is consume time trying to debug a bad plan for an improperly written SQL query. Frequently I’ve found that this is the case for many of those “I’ve never got it to run to completion” queries.

What Influences The Execution Plan

I think it’s important to understand what variables influence the Optimizer in order to focus the debugging effort. There are quite a number of variables, but frequently the cause of the problem ones are: (1) non-default optimizer parameters and (2) non-representative object/system statistics. Based on my observations I would say that the most abused Optimizer parameters are:

* OPTIMIZER_INDEX_CACHING
* OPTIMIZER_INDEX_COST_ADJ
* DB_FILE_MULTIBLOCK_READ_COUNT

Many see setting these as a solution to get the Optimizer to choose an index plan over a table scan plan, but this is problematic in several ways:

1. This is a global change to a local problem
2. Although it appears to solve one problem, it is unknown how many bad execution plans resulted from this change
3. The root cause of why the index plan was not chosen is unknown, just that tweaking parameters gave the desired result
4. Using non-default parameters makes it almost impossible to correctly and effectively troubleshoot the root cause

Object and system statistics can have a large influence on execution plans, but few actually take the time to sanity check them during triage. These statistics exist in views like:

* ALL_TAB_COL_STATISTICS
* ALL_PART_COL_STATISTICS
* ALL_INDEXES
* SYS.AUX_STATS$

Using GATHER_PLAN_STATISTICS With DBMS_XPLAN.DISPLAY_CURSOR

As a first step of triage, I would suggest executing the query with a GATHER_PLAN_STATISTICS hint followed by a call to DBMS_XPLAN.DISPLAY_CURSOR. The GATHER_PLAN_STATISTICS hint allows for the collection of extra metrics during the execution of the query. Specifically, it shows us the Optimizer’s estimated number of rows (E-Rows) and the actual number of rows (A-Rows) for each row source. If the estimates are vastly different from the actual, one probably needs to investigate why. For example: In the below plan, look at line 8. The Optimizer estimates 5,899 rows and the row source actually returns 5,479,000 rows. If the estimate is off by three orders of magnitude (1000), chances are the plan will be sub-optimal. Do note that with Nested Loop Joins you need to multiply the Starts column by the E-Rows column to get the A-Rows values (see line 10).
view source
print?
01 select /*+ gather_plan_statistics */ … from … ;
02 select * from table(dbms_xplan.display_cursor(null, null, ‘ALLSTATS LAST’));
03
04 ——————————————————————————————
05 | Id | Operation | Name | Starts | E-Rows | A-Rows |
06 ——————————————————————————————
07 | 1 | SORT GROUP BY | | 1 | 1 | 1 |
08 |* 2 | FILTER | | 1 | | 1728K |
09 | 3 | NESTED LOOPS | | 1 | 1 | 1728K |
10 |* 4 | HASH JOIN | | 1 | 1 | 1728K |
11 | 5 | PARTITION LIST SINGLE | | 1 | 6844 | 3029 |
12 |* 6 | INDEX RANGE SCAN | PROV_IX13 | 1 | 6844 | 3029 |
13 | 7 | PARTITION LIST SINGLE | | 1 | 5899 | 5479K |
14 |* 8 | TABLE ACCESS BY LOCAL INDEX ROWID | SERVICE | 1 | 5899 | 5479K |
15 |* 9 | INDEX SKIP SCAN | SERVICE_IX8 | 1 | 4934 | 5479K |
16 | 10 | PARTITION LIST SINGLE | | 1728K | 1 | 1728K |
17 |* 11 | INDEX RANGE SCAN | CLAIM_IX7 | 1728K | 1 | 1728K |
18 ——————————————————————————————

Using The CARDINALITY Hint

Now that I’ve demonstrated how to compare the cardinality estimates to the actual number of rows, what are the debugging options? If one asserts that the Optimizer will choose the optimal plan if it can accurately estimate the number of rows, one can test using the not so well (un)documented CARDINALITY hint. The CARDINALITY hint tells the Optimizer how many rows are coming out of a row source. The hint is generally used like such:
view source
print?
1 select /*+ cardinality(a 100) */ * from dual a;
2
3 ————————————————————————–
4 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
5 ————————————————————————–
6 | 0 | SELECT STATEMENT | | 100 | 200 | 2 (0)| 00:00:01 |
7 | 1 | TABLE ACCESS FULL| DUAL | 100 | 200 | 2 (0)| 00:00:01 |
8 ————————————————————————–

In this case I told the Optimizer that DUAL would return 100 rows (when in reality it returns 1 row) as seen in the Rows column from the autotrace output. The CARDINALITY hint is one tool one can use to give the Optimizer accurate information. I usually find this the best way to triage a bad plan as it is not a global change, it only effects a single execution of a statement in my session. If luck has it that using a CARDINALITY hint yields an optimal plan, one can move on to debugging where the cardinality is being miscalculated. Generally the bad cardinality is the result of non-representative table/column stats, but it also may be due to data correlation or other factors. This is where it pays off to know and understand the size and shape of the data. If the Optimizer still chooses a bad plan even with the correct cardinality estimates, it’s time to place a call to Oracle Support as more in-depth debugging is likely required.

Where Cardinality Can Go Wrong

There are several common scenarios that can lead to inaccurate cardinality estimates. Some of those on the list are:

1. Data skew: Is the NDV inaccurate due to data skew and a poor dbms_stats sample?
2. Data correlation: Are two or more predicates related to each other?
3. Out-of-range values: Is the predicate within the range of known values?
4. Use of functions in predicates: Is the 5% cardinality guess for functions accurate?
5. Stats gathering strategies: Is your stats gathering strategy yielding representative stats?

Some possible solutions to these issues are:

1. Data skew: Choose a sample size that yields accurate NDV. Use DBMS_STATS.AUTO_SAMPLE_SIZE in 11g.
2. Data correlation: Use Extended Stats in 11g. If <= 10.2.0.3 use a CARDINALITY hint if possible.
3. Out-of-range values: Gather or manually set the statistics.
4. Use of functions in predicates: Use a CARDINALITY hint where possible.
5. Stats gathering strategies: Use AUTO_SAMPLE_SIZE. Adjust only where necessary. Be mindful of tables with skewed data.

How To Best Work With Oracle Support

If you are unable to get to the root cause on your own, it is likely that you will be in contact with Oracle Support. To best assist the support analyst I would recommend you gather the following in addition to the query text:

1. Output from the GATHER_PLAN_STATISTICS and DBMS_XPLAN.DISPLAY_CURSOR
2. SQLTXPLAN output. See Metalink Note 215187.1
3. 10053 trace output. See Metalink Note 225598.1
4. DDL for all objects used (and dependencies) in the query. This is best gotten as a expdp (data pump) using CONTENT=METADATA_ONLY. This will also include the object statistics.
5. Output from: select pname, pval1 from sys.aux_stats$ where sname='SYSSTATS_MAIN';
6. A copy of your init.ora

Having this data ready before you even make the call (or create the SR on-line) should give you a jump on getting a quick(er) resolution.

Summary

While this blog post is not meant to be a comprehensive troubleshooting guide for bad execution plans, I do hope that it does help point you in the right direction the next time you encounter one. Many of the Optimizer issues I’ve seen are due to incorrect cardinality estimates, quite often due to inaccurate NDV or the result of data correlation. I believe that if you use a systematic approach you will find that debugging bad execution plans may be as easy as just getting the cardinality estimate correct.

DBMS_STATS, METHOD_OPT and FOR ALL INDEXED COLUMNS

Filed under: [System Performance tuning] — zhefeng @ 10:14 am

Another very good article about dbms_stats package:

http://structureddata.org/2008/10/14/dbms_stats-method_opt-and-for-all-indexed-columns/

I’ve written before on choosing an optimal stats gathering strategy but I recently came across a scenario that I didn’t directly blog about and think it deserves attention. As I mentioned in that previous post, one should only deviate from the defaults when they have a reason to, and fully understand that reason and the effect of that decision.

Understanding METHOD_OPT

The METHOD_OPT parameter of DBMS_STATS controls two things:

1. on which columns statistics will be collected
2. on which columns histograms will be collected (and how many buckets)

It is very important to understand #1 and how the choice of METHOD_OPT effects the collection of column statistics.

Prerequisite: Where Do I Find Column Statistics?

Understanding where to find column statistics is vital for troubleshooting bad execution plans. These views will be the arrows in your quiver:

* USER_TAB_COL_STATISTICS
* USER_PART_COL_STATISTICS
* USER_SUBPART_COL_STATISTICS

Depending on if the table is partitioned or subpartitioned, and depending on what GRANULARITY the stats were gathered with, the latter two of those views may or may not be populated.

The Bane of METHOD_OPT: FOR ALL INDEXED COLUMNS

If you are using FOR ALL INDEXED COLUMNS as part of your METHOD_OPT you probably should not be. Allow me to explain. Using MENTOD_OPT=>’FOR ALL INDEXED COLUMNS SIZE AUTO’ (a common METHOD_OPT I see) tells DBMS_STATS: “only gather stats on columns that participate in an index and based on data distribution and the workload of those indexed columns decide if a histogram should be created and how many buckets it should contain“. Is that really what you want? My guess is probably not. Let me work through a few examples to explain why.

I’m going to start with this table.
view source
print?
01 SQL> exec dbms_random.initialize(1);
02
03 PL/SQL procedure successfully completed.
04
05 SQL> create table t1
06 2 as
07 3 select
08 4 column_value pk,
09 5 round(dbms_random.value(1,2)) a,
10 6 round(dbms_random.value(1,5)) b,
11 7 round(dbms_random.value(1,10)) c,
12 8 round(dbms_random.value(1,100)) d,
13 9 round(dbms_random.value(1,100)) e
14 10 from table(counter(1,1000000))
15 11 /
16
17 Table created.
18
19 SQL> begin
20 2 dbms_stats.gather_table_stats(
21 3 ownname => user ,
22 4 tabname => ‘T1′ ,
23 5 estimate_percent => 100 ,
24 6 cascade => true);
25 7 end;
26 8 /
27
28 PL/SQL procedure successfully completed.
29
30 SQL> select
31 2 COLUMN_NAME, NUM_DISTINCT, HISTOGRAM, NUM_BUCKETS,
32 3 to_char(LAST_ANALYZED,’yyyy-dd-mm hh24:mi:ss’) LAST_ANALYZED
33 4 from user_tab_col_statistics
34 5 where table_name=’T1′
35 6 /
36
37 COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS LAST_ANALYZED
38 ———– ———— ————— ———– ——————-
39 PK 1000000 NONE 1 2008-13-10 18:39:51
40 A 2 NONE 1 2008-13-10 18:39:51
41 B 5 NONE 1 2008-13-10 18:39:51
42 C 10 NONE 1 2008-13-10 18:39:51
43 D 100 NONE 1 2008-13-10 18:39:51
44 E 100 NONE 1 2008-13-10 18:39:51
45
46 6 rows selected.

This 6 column table contains 1,000,000 rows of randomly generated numbers. I’ve queried USER_TAB_COL_STATISTICS to display some of the important attributes (NDV, Histogram, Number of Buckets, etc).

I’m going to now put an index on T1(PK), delete the stats and recollect stats using two different METHOD_OPT parameters that each use ‘FOR ALL INDEXED COLUMNS’.
view source
print?
01 SQL> create unique index PK_T1 on T1(PK);
02
03 Index created.
04
05 SQL> begin
06 2 dbms_stats.delete_table_stats(user,’T1′);
07 3
08 4 dbms_stats.gather_table_stats(
09 5 ownname => user ,
10 6 tabname => ‘T1′ ,
11 7 estimate_percent => 100 ,
12 8 method_opt => ‘for all indexed columns’ ,
13 9 cascade => true);
14 10 end;
15 11 /
16
17 PL/SQL procedure successfully completed.
18
19 SQL> select COLUMN_NAME, NUM_DISTINCT, HISTOGRAM, NUM_BUCKETS,
20 2 to_char(LAST_ANALYZED,’yyyy-dd-mm hh24:mi:ss’) LAST_ANALYZED
21 3 from user_tab_col_statistics
22 4 where table_name=’T1′
23 5 /
24
25 COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS LAST_ANALYZED
26 ———– ———— ————— ———– ——————-
27 PK 1000000 HEIGHT BALANCED 75 2008-13-10 18:41:10
28
29 SQL> begin
30 2 dbms_stats.delete_table_stats(user,’T1′);
31 3
32 4 dbms_stats.gather_table_stats(
33 5 ownname => user ,
34 6 tabname => ‘T1′ ,
35 7 estimate_percent => 100 ,
36 8 method_opt => ‘for all indexed columns size auto’ ,
37 9 cascade => true);
38 10 end;
39 11 /
40
41 PL/SQL procedure successfully completed.
42
43 SQL> select COLUMN_NAME, NUM_DISTINCT, HISTOGRAM, NUM_BUCKETS,
44 2 to_char(LAST_ANALYZED,’yyyy-dd-mm hh24:mi:ss’) LAST_ANALYZED
45 3 from user_tab_col_statistics
46 4 where table_name=’T1′
47 5 /
48
49 COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS LAST_ANALYZED
50 ———– ———— ————— ———– ——————-
51 PK 1000000 NONE 1 2008-13-10 18:41:12

Notice that in both cases only column PK has stats on it. Columns A,B,C,D and E do not have any stats collected on them. Also note that when no SIZE clause is specified, it defaults to 75 buckets.

Now one might think that is no big deal or perhaps they do not realize this is happening because they do not look at their stats. Let’s see what we get for cardinality estimates from the Optimizer for a few scenarios.
view source
print?
01 SQL> select /*+ gather_plan_statistics */
02 2 count(*)
03 3 from t1
04 4 where a=1
05 5 /
06
07 COUNT(*)
08 ———-
09 500227
10
11 SQL> select * from table(dbms_xplan.display_cursor(null, null, ‘allstats last’));
12
13 PLAN_TABLE_OUTPUT
14 ——————————————————————————————
15 SQL_ID 4df0g0r99zmba, child number 0
16 ————————————-
17 select /*+ gather_plan_statistics */ count(*) from t1 where a=1
18
19 Plan hash value: 3724264953
20
21 ————————————————————————————-
22 | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
23 ————————————————————————————-
24 | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.24 | 3466 |
25 |* 2 | TABLE ACCESS FULL| T1 | 1 | 10000 | 500K|00:00:00.50 | 3466 |
26 ————————————————————————————-
27
28 Predicate Information (identified by operation id):
29 —————————————————
30
31 2 – filter(“A”=1)

Notice the E-Rows estimate for T1. The Optimizer is estimating 10,000 rows when in reality there is 500,227. The estimate is off by more than an order of magnitude (50x). Normally the calculation for the cardinality would be (for a one table single equality predicate):
number of rows in T1 * 1/NDV = 1,000,000 * 1/2 = 500,000
but in this case 10,000 is the estimate. Strangely enough (or not), 10,000 is exactly 0.01 (1%) of 1,000,000. Because there are no column stats for T1.A, the Optimizer is forced to make a guess, and that guess is 1%.

As you can see from the 10053 trace (below), since there are no statistics on the column, defaults are used. In this case they yield very poor cardinality estimations.

SINGLE TABLE ACCESS PATH
—————————————–
BEGIN Single Table Cardinality Estimation
—————————————–
Column (#2): A(NUMBER) NO STATISTICS (using defaults)
AvgLen: 13.00 NDV: 31250 Nulls: 0 Density: 3.2000e-05
Table: T1 Alias: T1
Card: Original: 1000000 Rounded: 10000 Computed: 10000.00 Non Adjusted: 10000.00
—————————————–
END Single Table Cardinality Estimation
—————————————–

Now that I’ve demonstrated how poor the cardinality estimation was with a single equality predicate, let’s see what two equality predicates gives us for a cardinality estimate.
view source
print?
01 SQL> select /*+ gather_plan_statistics */
02 2 count(*)
03 3 from t1
04 4 where a=1
05 5 and b=3
06 6 /
07
08 COUNT(*)
09 ———-
10 124724
11
12 SQL> select * from table(dbms_xplan.display_cursor(null, null, ‘allstats last’));
13
14 PLAN_TABLE_OUTPUT
15 ——————————————————————————————
16 SQL_ID ctq8q59qdymw6, child number 0
17 ————————————-
18 select /*+ gather_plan_statistics */ count(*) from t1 where a=1 and b=3
19
20 Plan hash value: 3724264953
21
22 ————————————————————————————-
23 | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
24 ————————————————————————————-
25 | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.19 | 3466 |
26 |* 2 | TABLE ACCESS FULL| T1 | 1 | 100 | 124K|00:00:00.25 | 3466 |
27 ————————————————————————————-
28
29 Predicate Information (identified by operation id):
30 —————————————————
31
32 2 – filter((“A”=1 AND “B”=3))

Yikes. In this case the cardinality estimate is 100 when the actual number of rows is 124,724, a difference of over 3 orders of magnitude (over 1000x). Where did the 100 row estimate come from? In this case there are two equality predicates so the selectivity is calculated as 1% * 1% or 0.01 * 0.01 = 0.0001. 1,000,000 * 0.0001 = 100. Funny that. (The 1% is the default selectivity for an equality predicate w/o stats.)

Now let’s add a derived predicate as well and check the estimates.
view source
print?
01 SQL> select /*+ gather_plan_statistics */
02 2 count(*)
03 3 from t1
04 4 where a=1
05 5 and b=3
06 6 and d+e > 50
07 7 /
08
09 COUNT(*)
10 ———-
11 109816
12
13 SQL> select * from table(dbms_xplan.display_cursor(null, null, ‘allstats last’));
14
15 PLAN_TABLE_OUTPUT
16 ——————————————————————————————
17 SQL_ID 5x200q9rqvvfu, child number 0
18 ————————————-
19 select /*+ gather_plan_statistics */ count(*) from t1 where a=1 and b=3
20 and d+e > 50
21
22 Plan hash value: 3724264953
23
24 ————————————————————————————-
25 | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
26 ————————————————————————————-
27 | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.22 | 3466 |
28 |* 2 | TABLE ACCESS FULL| T1 | 1 | 5 | 109K|00:00:00.33 | 3466 |
29 ————————————————————————————-
30
31 Predicate Information (identified by operation id):
32 —————————————————
33
34 2 – filter((“A”=1 AND “B”=3 AND “D”+”E”>50))

Doh! The cardinality estimate is now 5, but the actual number of rows being returned is 109,816. Not good at all. The Optimizer estimated 5 rows because it used a default selectivity of 1% (for A=1) * 1% (for B=3) * 5% (for D+E > 50) * 1,000,000 rows. Now can you see why column statistics are very important? All it takes is a few predicates and the cardinality estimation becomes very small, very fast. Now consider this:

* What is likely to happen in a data warehouse where the queries are 5+ table joins and the fact table columns do not have indexes?
* Would the Optimizer choose the correct driving table?
* Would nested loops plans probably be chosen when it is really not appropriate?

Hopefully you can see where this is going. If you don’t, here is the all too common chain of events:

* Non representative (or missing) statistics lead to
* Poor cardinality estimates which leads to
* Poor access path selection which leads to
* Poor join method selection which leads to
* Poor join order selection which leads to
* Poor SQL execution times

Take 2: Using the Defaults

Now I’m going to recollect stats with a default METHOD_OPT and run through the 3 execution plans again:
view source
print?
01 SQL> begin
02 2 dbms_stats.delete_table_stats(user,’t1′);
03 3
04 4 dbms_stats.gather_table_stats(
05 5 ownname => user ,
06 6 tabname => ‘T1′ ,
07 7 estimate_percent => 100 ,
08 8 degree => 8,
09 9 cascade => true);
10 10 end;
11 11 /
12
13 PL/SQL procedure successfully completed.
14
15 SQL> select column_name, num_distinct, histogram, NUM_BUCKETS,
16 2 to_char(LAST_ANALYZED,’yyyy-dd-mm hh24:mi:ss’) LAST_ANALYZED
17 3 from user_tab_col_statistics where table_name=’T1′
18 4 /
19
20 COLUMN_NAME NUM_DISTINCT HISTOGRAM NUM_BUCKETS LAST_ANALYZED
21 ———– ———— ————— ———– ——————-
22 PK 1000000 NONE 1 2008-13-10 19:44:32
23 A 2 FREQUENCY 2 2008-13-10 19:44:32
24 B 5 FREQUENCY 5 2008-13-10 19:44:32
25 C 10 FREQUENCY 10 2008-13-10 19:44:32
26 D 100 NONE 1 2008-13-10 19:44:32
27 E 100 NONE 1 2008-13-10 19:44:32
28
29 6 rows selected.
view source
print?
01 SQL> select /*+ gather_plan_statistics */
02 2 count(*)
03 3 from t1
04 4 where a=1
05 5 /
06
07 COUNT(*)
08 ———-
09 500227
10
11 SQL> select * from table(dbms_xplan.display_cursor(null, null, ‘allstats last’));
12
13 PLAN_TABLE_OUTPUT
14 ——————————————————————————————
15 SQL_ID 4df0g0r99zmba, child number 0
16 ————————————-
17 select /*+ gather_plan_statistics */ count(*) from t1 where a=1
18
19 Plan hash value: 3724264953
20
21 ————————————————————————————-
22 | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
23 ————————————————————————————-
24 | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.20 | 3466 |
25 |* 2 | TABLE ACCESS FULL| T1 | 1 | 500K| 500K|00:00:00.50 | 3466 |
26 ————————————————————————————-
27
28 Predicate Information (identified by operation id):
29 —————————————————
30
31 2 – filter(“A”=1)
view source
print?
01 SQL> select /*+ gather_plan_statistics */
02 2 count(*)
03 3 from t1
04 4 where a=1
05 5 and b=3
06 6 /
07
08 COUNT(*)
09 ———-
10 124724
11
12 SQL> select * from table(dbms_xplan.display_cursor(null, null, ‘allstats last’));
13
14 PLAN_TABLE_OUTPUT
15 ——————————————————————————————
16 SQL_ID ctq8q59qdymw6, child number 0
17 ————————————-
18 select /*+ gather_plan_statistics */ count(*) from t1 where a=1 and b=3
19
20 Plan hash value: 3724264953
21
22 ————————————————————————————-
23 | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
24 ————————————————————————————-
25 | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.14 | 3466 |
26 |* 2 | TABLE ACCESS FULL| T1 | 1 | 124K| 124K|00:00:00.25 | 3466 |
27 ————————————————————————————-
28
29 Predicate Information (identified by operation id):
30 —————————————————
31
32 2 – filter((“B”=3 AND “A”=1))
view source
print?
01 SQL> select /*+ gather_plan_statistics */
02 2 count(*)
03 3 from t1
04 4 where a=1
05 5 and b=3
06 6 and d+e > 50
07 7 /
08
09 COUNT(*)
10 ———-
11 109816
12
13 SQL> select * from table(dbms_xplan.display_cursor(null, null, ‘allstats last’));
14
15 PLAN_TABLE_OUTPUT
16 ——————————————————————————————
17 SQL_ID 5x200q9rqvvfu, child number 0
18 ————————————-
19 select /*+ gather_plan_statistics */ count(*) from t1 where a=1 and b=3
20 and d+e>50
21
22 Plan hash value: 3724264953
23
24 ————————————————————————————-
25 | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
26 ————————————————————————————-
27 | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.17 | 3466 |
28 |* 2 | TABLE ACCESS FULL| T1 | 1 | 6236 | 109K|00:00:00.22 | 3466 |
29 ————————————————————————————-
30
31 Predicate Information (identified by operation id):
32 —————————————————
33
34 2 – filter((“B”=3 AND “A”=1 AND “D”+”E”>50))

As you can see, the first two queries have spot on cardinality estimates, but the the third query isn’t as good as it uses a column combination and there are no stats on D+E columns, only D and E individually. I’m going to rerun the third query with dynamic sampling set to 4 (in 10g it defaults to 2) and reevaluate the cardinality estimate.
view source
print?
01 SQL> alter session set optimizer_dynamic_sampling=4;
02
03 Session altered.
04
05 SQL> select /*+ gather_plan_statistics */
06 2 count(*)
07 3 from t1
08 4 where a=1
09 5 and b=3
10 6 and d+e > 50
11 7 /
12
13 COUNT(*)
14 ———-
15 109816
16
17 SQL> select * from table(dbms_xplan.display_cursor(null, null, ‘allstats last’));
18
19 PLAN_TABLE_OUTPUT
20 ——————————————————————————————
21 SQL_ID 5x200q9rqvvfu, child number 1
22 ————————————-
23 select /*+ gather_plan_statistics */ count(*) from t1 where a=1 and b=3
24 and d+e > 50
25
26 Plan hash value: 3724264953
27
28 ————————————————————————————-
29 | Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
30 ————————————————————————————-
31 | 1 | SORT AGGREGATE | | 1 | 1 | 1 |00:00:00.17 | 3466 |
32 |* 2 | TABLE ACCESS FULL| T1 | 1 | 102K| 109K|00:00:00.22 | 3466 |
33 ————————————————————————————-
34
35 Predicate Information (identified by operation id):
36 —————————————————
37
38 2 – filter((“B”=3 AND “A”=1 AND “D”+”E”>50))
39
40 Note
41 —–
42 – dynamic sampling used for this statement

Bingo! Close enough to call statistically equivalent.

Summary

I hope this little exercise demonstrates how important it is to have representative statistics and that when statistics are representative the Optimizer can very often accurately estimate the cardinality and thus choose the best plan for the query. Remember these points:

* Recent statistics do not necessarily equate to representative statistics.
* Statistics are required on all columns to yield good plans – not just indexed columns.
* You probably should not be using METHOD_OPT => ‘FOR ALL INDEXED COLUMNS SIZE AUTO’, especially in a data warehouse where indexes are used sparingly.
* Dynamic Sampling can assist with cardinality estimates where existing stats are not enough.

Older Posts »

Theme: Silver is the New Black. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.