DBA Sensation

April 12, 2012

Set Oracle SGA > 256GB

Filed under: [Installation] — Tags: , , — zhefeng @ 2:06 pm

I had a installation request for installing Oracle 11gR2 on a 2TB memory server. The installation failed on DBCA with complains about can’t reach shared memory.

Check the metalink didn’t find any solution. My colleague told me he was having the same issue before. Oracle told him to set SGA less than 256 GB as a “workaround”.

I followed “workaround” and continued my installation. Later I did some research and I found this:

 

Solution

Checking the swap and the kernel parameters, everything was adjusted as per recommended by oracle, investigating the issue further, seems that This is caused by the prelink command. It calculates shared library load addresses, and updates the shared libraries with them. Simplest thing to do is to undo what prelink did, and disable it.
prelink -ua
sed -i ‘s/PRELINKING=yes/PRELINKING=no/’ /etc/sysconfig/prelink

 

From: https://support.oracle.com/CSP/ui/flash.html#tab=KBHome%28page=KBHome&id=%28%29%29,%28page=KBNavigator&id=%28bmDocTitle=Why%20not%20able%20to%20allocate%20a%20more%20SGA%20than%20193G%20on%20Linux%2064?&from=BOOKMARK&bmDocType=HOWTO&bmDocID=1241284.1&viewingMode=1143&bmDocDsrc=KB%29%29

Doc ID: 1241284.1

I haven’t tried it yet. anyone is having the same problem can give a try and let me know.

September 7, 2010

Oracle 10g ASM/RAW storage migration

Filed under: [RAC] — Tags: , , , , , , — zhefeng @ 9:47 am

Objective:
we want to migrate the whole shared storage from old SAN to new SAN without re-installing the whole Oracle RAC

Scenario:
1.Current structure
[Nodes]
## eth1-Public
10.0.0.101 vmrac01 vmrac01.test.com
10.0.0.102 vmrac02 vmrac02.test.com
## eth0-Private
192.168.199.1 vmracprv01 vmracprv01.test.com
192.168.199.2 vmracprv02 vmracprv02.test.com
## VIP
10.0.0.103 vmracvip01 vmracvip01.test.com
10.0.0.104 vmracvip02 vmracvip02.test.com

[Storage]
Both ORACLE_HOME are local:
ORACLE_HOME=/database/oracle/10grac/db
CRS_HOME=/database/oracle/10grac/crs

Shared LUN display (3 partitions, 2*256M for OCR&VOTING, 1*20G for ASM)
Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 32 257008+ 83 Linux
/dev/sdb2 33 64 257040 83 Linux
/dev/sdb3 65 2610 20450745 83 Linux

OCR and Voting are on RAW device: /dev/sdb1 /dev/sdb2

ASM disks
bash-3.1$ export ORACLE_SID=+ASM1
bash-3.1$ asmcmd
ASMCMD> lsdg
State Type Rebal Unbal Sector Block AU Total_MB Free_MB Req_mir_free_MB Usable_file_MB Offline_disks Name
MOUNTED EXTERN N N 512 4096 1048576 19971 17925 0 17925 0 DG1/

2. New storage (sdc 10G)
1). new LUN added
[root@vmrac01 bin]# fdisk -l

Disk /dev/sda: 26.8 GB, 26843545600 bytes
255 heads, 63 sectors/track, 3263 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sda1 * 1 13 104391 83 Linux
/dev/sda2 14 535 4192965 82 Linux swap / Solaris
/dev/sda3 536 3263 21912660 83 Linux

Disk /dev/sdb: 21.4 GB, 21474836480 bytes
255 heads, 63 sectors/track, 2610 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdb1 1 32 257008+ 83 Linux
/dev/sdb2 33 64 257040 83 Linux
/dev/sdb3 65 2610 20450745 83 Linux

Disk /dev/sdc: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

2). Partition the new LUN to 3 partitions
Disk /dev/sdc: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdc1 1 32 257008+ 83 Linux
/dev/sdc2 33 64 257040 83 Linux
/dev/sdc3 65 1305 9968332+ 83 Linux

3). clone data from previous raw disks
**shutdown db and crs first to make sure there is no change for raw disks!
#dd if=/dev/raw/raw1 of=/dev/sdc1
514017+0 records in
514017+0 records out
263176704 bytes (263 MB) copied, 252.812 seconds, 1.0 MB/s

#dd if=/dev/raw/raw2 of=/dev/sdc2
514080+0 records in
514080+0 records out
263208960 bytes (263 MB) copied, 267.868 seconds, 983 kB/s

4).”cheating” the Oracle by re-binding to new device on both nodes
**old binding
Step1: add entries to /etc/udev/rules.d/60-raw.rules
ACTION==”add”, KERNEL==”sdb1″, RUN+=”/bin/raw /dev/raw/raw1 %N”
ACTION==”add”, KERNEL==”sdb2″, RUN+=”/bin/raw /dev/raw/raw2 %N”

Step2: For the mapping to have immediate effect, run below command
#raw /dev/raw/raw1 /dev/sdb1
#raw /dev/raw/raw2 /dev/sdb2

Step3: Run the following commands and add them the /etc/rc.local file.
#chown oracle:dba /dev/raw/raw1
#chown oracle:dba /dev/raw/raw2
#chmod 660 /dev/raw/raw1
#chmod 660 /dev/raw/raw2
#chown oracle:dba /dev/sdb1
#chown oracle:dba /dev/sdb2
#chmod 660 /dev/sdb1
#chmod 660 /dev/sdb2

**new binding on both node
Step1: editing /etc/udev/rules.d/60-raw.rules
ACTION==”add”, KERNEL==”sdc1″, RUN+=”/bin/raw /dev/raw/raw1 %N”
ACTION==”add”, KERNEL==”sdc2″, RUN+=”/bin/raw /dev/raw/raw2 %N”

Step2: mapping immediately
#raw /dev/raw/raw1 /dev/sdc1
#raw /dev/raw/raw2 /dev/sdc2

Step3:permission and edit /etc/rc.local
#chown oracle:dba /dev/raw/raw1
#chown oracle:dba /dev/raw/raw2
#chmod 660 /dev/raw/raw1
#chmod 660 /dev/raw/raw2
#chown oracle:dba /dev/sdc1
#chown oracle:dba /dev/sdc2
#chmod 660 /dev/sdc1
#chmod 660 /dev/sdc2

5). startup crs and oracle db, check the database, everything works fine after switching the raw disks!

3. ASM disk group migration
1). Mark the new disk sdc3 on one node
# /etc/init.d/oracleasm createdisk VOL2 /dev/sdc3
Marking disk “/dev/sdc3” as an ASM disk: [ OK ]

2). scan disk on the other node
[root@vanpgvmrac02 bin]# /etc/init.d/oracleasm scandisks
Scanning system for ASM disks: [ OK ]

3). now verify the new disk was marked on both node
[root@vmrac01 disks]# /etc/init.d/oracleasm listdisks
VOL1
VOL2

[root@vmrac02 bin]# /etc/init.d/oracleasm listdisks
VOL1
VOL2

4). add new disk to DISKGROUP (under asm instance)
$export ORACLE_SID=+ASM1
$sqlplus / as sysdba
sql>alter diskgroup DG1 add disk VOL2
–wait rebalancing
sql>select * from v$asm_operation

5). remove old disk from DISKGROUP
sql>alter diskgroup DG1 drop disk VOL1
–wait until rebalancing finished
sql>select * from v$asm_operation
GROUP_NUMBER OPERATION STATE POWER ACTUAL SOFAR
———— ————— ———— ———- ———- ———-
EST_WORK EST_RATE EST_MINUTES
———- ———- ———–
1 REBAL RUN 1 1 2
1374 30 45

6). verify the database and asm, everything is ok!

7). clean-up the old disk confiruations
[root@vmrac01 bin]# /etc/init.d/oracleasm deletedisk VOL1
Removing ASM disk “VOL1”: [ OK ]
[root@vmrac01 bin]# /etc/init.d/oracleasm listdisks
VOL2

[root@vmrac02 ~]# /etc/init.d/oracleasm scandisks
Scanning system for ASM disks: [ OK ]
[root@vmrac02 ~]# /etc/init.d/oracleasm listdisks
VOL2

8). wipe-off the partitions for sdb.

Reference:
1. Exact Steps To Migrate ASM Diskgroups To Another SAN Without Downtime. [ID 837308.1]
2. Previous doc “VMRAC installation” task 130.2008.09.12
3. OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE), including moving from RAW Devices to Block Devices. [ID 428681.1]
4. ASM using ASMLib and Raw Devices
http://www.oracle-base.com/articles/10g/ASMUsingASMLibAndRawDevices.php

June 9, 2010

Sth. about checkpoint

Filed under: 1. Oracle, [System Performance tuning] — Tags: , , — zhefeng @ 2:31 pm

reading a article about checkpoint on metalink(Checkpoint Tuning and Troubleshooting Guide [ID 147468.1])

Here are some good points for checkpoint:

Oracle writes the dirty buffers to disk only on certain conditions:
– A shadow process must scan more than one-quarter of the db_block_buffer
parameter.
– Every three seconds.
– When a checkpoint is produced.

A checkpoint is realized on five types of events:
– At each switch of the redo log files.
– When the delay for LOG_CHECKPOINT_TIMEOUT is reached.
– When the size in bytes corresponding to :
(LOG_CHECKPOINT_INTERVAL* size of IO OS blocks)
is written on the current redo log file.
– Directly by the ALTER SYSTEM SWITCH LOGFILE command.
– Directly with the ALTER SYSTEM CHECKPOINT command.

During a checkpoint the following occurs:
– The database writer (DBWR) writes all modified database
blocks in the buffer cache back to datafiles,
– Checkpoint process (ckpt) updates the headers of all
the datafiles to indicate when the last checkpoint
occurred (SCN)

May 25, 2010

Can’t compile a stored procedure when it’s locked

Filed under: 1. Oracle, [PL/SQL dev&tuning] — Tags: , , — zhefeng @ 10:25 am

Trying to recompile a procedure causes the application to hang
(ie: SQL*Plus hangs after submitting the statement). Eventually ORA-4021 errors
occur after the timeout (usually 5 minutes). Here is the soluation from metalink:
Note:ID 107756.1

Error: ORA 4021
Text: time-out occurred while waiting to lock object
—————————————————————————–
Cause: While trying to lock a library object, a time-out occurred.
Action: Retry the operation later.

Solution Description
——————–

Verify that the package is not locked by another user by selecting from
V$ACCESS view. To do this, run:

SELECT * FROM v$access WHERE object = ”;

Where is the package name (usually in all uppercase). If there is a row
returned, then the package is already locked and cannot be dropped until the
lock is released. Returned from the query above will be the SID that has this
locked. You can then use this to find out which session has obtained the lock.

In some cases, that session might have been killed and will not show up. If
this happens, the lock will not be release immediately. Waiting for PMON to
clean up the lock might take some time. The fastest way to clean up the lock
is to recycle the database instance.

If an ORA-4021 error is not returned and the command continues to hang after
issuing the CREATE OR REPLACE or DROP statment, you will need to do further
analysis see where the hang is occuring. A starting point is to have a
look in v$session_wait, see the referenced NOTE.61552.1 for how to analyze hang
situations in general

Solution Explanation
——————–

Consider the following example:

Session 1:

create or replace procedure lockit(secs in number) as
shuttime date;
begin
shuttime := sysdate + secs/(24*60*60);
while sysdate <= shuttime loop
null;
end loop;
end;
/
show err

begin
— wait 10 minutes
lockit(600);
end;
/

Session 2:
create or replace procedure lockit as
begin
null;
end;
/

Result: hang and eventually (the timeout is 5 minutes):

create or replace procedure lockit as
*
ERROR at line 1:
ORA-04021: timeout occurred while waiting to lock object LOCKIT

Session 3:

connect / as sysdba
col owner for a10
col object for a15
select * from v$access where object = 'LOCKIT';

Result:
SID OWNER OBJECT TYPE
———- ———- ————— ————————
9 OPS$HNAPEL LOCKIT PROCEDURE

select sid, event from v$session_wait;

Result:

SID EVENT
———- —————————————————————-
9 null event

12 library cache pin

In the above result, the blocking sid 9 waits for nothing while session 12, the
hanging session, is waiting for event library cache pin.

March 12, 2010

Why Isn’t Oracle Using My Index?!

Filed under: [System Performance tuning] — Tags: , , , — zhefeng @ 4:02 pm

By Jonathan Lewis
http://www.dbazine.com/oracle/or-articles/jlewis12

The question in the title of this piece is probably the single most frequently occurring question that appears in the Metalink forums and Usenet newsgroups. This article uses a test case that you can rebuild on your own systems to demonstrate the most fundamental issues with how cost-based optimisation works. And at the end of the article, you should be much better equipped to give an answer the next time you hear that dreaded question.

Because of the wide variety of options that are available when installing Oracle, it isn’t usually safe to predict exactly what will happen when someone runs a script that you have dictated to them. But I’m going to risk it, in the hope that your database is a fairly vanilla installation, with the default values for the mostly commonly tweaked parameters. The example has been built and tested on an 8.1.7 database with the db_block_size set to the commonly used value of 8K and the db_file_multiblock_read_count set to the equally commonly used value 8. The results may be a little different under Oracle 9.2

Run the script from Figure 1, which creates a couple of tables, then indexes and analyses them.

create table t1 as
select
trunc((rownum-1)/15) n1,
trunc((rownum-1)/15) n2,
rpad(‘x’, 215) v1
from all_objects<
where rownum <= 3000;

create table t2 as
select
mod(rownum,200) n1,
mod(rownum,200) n2,
rpad('x',215) v1
from all_objects
where rownum <= 3000;

create index t1_i1 on t1(N1);
create index t2_i1 on t2(n1);

analyze table t1 compute
statistics;
analyze table t2 compute
statistics;

Figure 1: The test data sets.

Once you have got this data in place, you might want to convince yourself that the two sets of data are identical — in particular, that the N1 columns in both data sets have values ranging from 0 to 199, with 15 occurrences of each value. You might try the following check:

select n1, count(*)
from t1
group by n1;

and the matching query against T2 to prove the point.

If you then execute the queries:

select * from t1 where n1 = 45;
select * from t2 where n1 = 45;

You will find that each query returns 15 rows. However if you

set autotrace traceonly explain

you will discover that the two queries have different execution paths.

The query against table T1 uses the index, but the query against table T2 does a full tablescan.

So you have two sets of identical data, with dramatically different access paths for the same query.
What Happened to the Index?

Note: if you've ever come across any of those "magic number" guidelines regarding the use of indexes, e.g., "Oracle will use an index for less than 23 percent, 10 percent, 2 percent (pick number at random) of the data," then you may at this stage begin to doubt their validity. In this example, Oracle has used a tablescan for 15 rows out of 3,000, i.e., for just one half of one percent of the data!

To investigate problems like this, there is one very simple ploy that I always try as the first step: Put in some hints to make Oracle do what I think it ought to be doing, and see if that gives me any clues.

In this case, a simple hint:

/*+ index(t2, t2_i1) */

is sufficient to switch Oracle from the full tablescan to the indexed access path. The three paths with costs (abbreviated to C=nnn) are shown in Figure 2:

select * from t1 where n1 = 45;

EXECUTION PLAN
————–
TABLE ACCESS BY INDEX ROWID OF T1 (C=2)
INDEX(RANGE SCAN) OF T1_I1 (C=1)

select * from t2 where n1 = 45;

EXECUTION PLAN
————–
TABLE ACCESS FULL OF T2 (C=15)

select /*+ index(t2 t2_i1) */
*
from t1
where n1 = 45;

EXECUTION PLAN
————–
TABLE ACCESS BY INDEX ROWID OF T2 (C=16)
INDEX(RANGE SCAN) OF T2_I1 (C=1)

Figure 2: The different queries and their costs.

So why hasn't Oracle used the index by default in for the T2 query? Easy — as the execution plan shows, the cost of doing the tablescan is cheaper than the cost of using the index.
Why is the Tablescan Cheaper?

This, of course, is simply begging the question. Why is the cost of the tablescan cheaper than the cost of using the index?

By looking into this question, you uncover the key mechanisms (and critically erroneous assumptions) of the Cost Based Optimiser.

Let's start by examining the indexes by running the query:

select
table_name,
blevel,
avg_data_blocks_per_key,
avg_leaf_blocks_per_key,
clustering_factor
from user_indexes;

The results are given in the table below:
T1 T2
Blevel 1 1
Data block / key 1 15
Leaf block / key 1 1
Clustering factor 96 3000

Note particularly the value for "data blocks per key." This is the number of different blocks in the table that Oracle thinks it will have to visit if you execute a query that contains an equality test on a complete key value for this index.

So where do the costs for our queries come from? As far as Oracle is concerned, if we fire in the key value 45, we get the data from table T1 by hitting one index leaf block and one table block — two blocks, so a cost of two.

If we try the same with table T2, we have to hit one index leaf block and 15 table blocks — a total of 16 blocks, so a cost of 16.

Clearly, according to this viewpoint, the index on table T1 is much more desirable than the index on table T2. This leaves two questions outstanding, though:

Where does the tablescan cost come from, and why are the figures for the avg_data_blocks_per_key so different between the two tables?

The answer to the second question is simple. Look back at the definition of table T1 — it uses the trunc() function to generate the N1 values, dividing the "rownum – 1 "by 15 and truncating.

Trunc(675/15) = 45
Trunc(676/15) = 45

Trunc(689/15) = 45

All the rows with the value 45 do actually appear one after the other in a tight little clump (probably all fitting one data block) in the table.

Table T2 uses the mod() function to generate the N1 values, using modulus 200 on the rownum:

mod(45,200) = 45
mod(245,200) = 45

mod(2845,200) = 45

The rows with the value 45 appear every two hundredth position in the table (probably resulting in no more than one row in every relevant block).

By doing the analyze, Oracle was able to get a perfect description of the data scatter in our table. So the optimiser was able to work out exactly how many blocks Oracle would have to visit to answer our query — and, in simple cases, the number of block visits is the cost of the query.
But Why the Tablescan?

So we see that an indexed access into T2 is more expensive than the same path into T1, but why has Oracle switched to the tablescan?

This brings us to the two simple-minded, and rather inappropriate, assumptions that Oracle makes.

The first is that every block acquisition equates to a physical disk read, and the second is that a multiblock read is just as quick as a single block read.

So what impact do these assumptions have on our experiment?

If you query the user_tables view with the following SQL:

select
table_name,
blocks
from user_tables;

you will find that our two tables each cover 96 blocks.

At the start of the article, I pointed out that the test case was running a version 8 system with the value 8 for the db_file_multiblock_read_count.

Roughly speaking, Oracle has decided that it can read the entire 96 block table in 96/8 = 12 disk read requests.

Since it takes 16 block (= disk read) requests to access the table by index, it is clearer quicker (from Oracle's sadly deluded perspective) to scan the table — after all 12 is less than 16.

Voila! If the data you are targetting is suitably scattered across the table, you get tablescans even for a very small percentage of the data — a problem that can be exaggerated in the case of very big blocks and very small rows.
Correction

In fact, you will have noticed that my calculated number of scan reads was 12, whilst the cost reported in the execution plan was 15. It is a slight simplfication to say that the cost of a tablescan (or an index fast full scan for that matter) is

'number of blocks' /
db_file_multiblock_read_count.

Oracle uses an "adjusted" multi-block read value for the calculation (although it then tries to use the actual requested size when the scan starts to run).

For reference, the following table compares a few of the actual and adjusted values:
Actual Adjusted
4 4.175
8 6.589
16 10.398
32 16.409
64 25.895
128 40.865

As you can see, Oracle makes some attempt to protect you from the error of supplying an unfeasibly large value for this parameter.

There is a minor change in version 9, by the way, where the tablescan cost is further adjusted by adding one to result of the division — which means tablescans in V9 are generally just a little more expensive than in V8, so indexes are just a little more likely to be used.
Adjustments

We have seen that there are two assumptions built into the optimizer that are not very sensible.

* A single block read costs just as much as a multi-block read — (not really likely, particularly when running on file systems without direction)
* A block access will be a physical disk read — (so what is the buffer cache for?)

Since the early days of Oracle 8.1, there have been a couple of parameters that allow us to correct these assumption in a reasonably truthful way.

See Tim Gorman's article for a proper description of these parameters, but briefly:

Optimizer_index_cost_adj takes a value between 1 and 10000 with a default of 100. Effectively, this parameter describes how cheap a single block read is compared to a multiblock read. For example the value 30 (which is often a suitable first guess for an OLTP system) would tell Oracle that a single block read costs 30% of a multiblock read. Oracle would therefore incline towards using indexed access paths for low values of this parameter.

Optimizer_index_caching takes a value between 0 and 100 with a default of 0. This tells Oracle to assume that that percentage of index blocks will be found in the buffer cache. In this case, setting values close to 100 encourages the use of indexes over tablescans.

The really nice thing about both these parameters is that they can be set to "truthful" values.

Set the optimizer_index_caching to something in the region of the "buffer cache hit ratio." (You have to make your own choice about whether this should be the figure derived from the default pool, keep pool, or both).

The optimizer_index_cost_adj is a little more complicated. Check the typical wait times in v$system_event for the events "db file scattered read" (multi block reads) and "db file sequential reads" (single block reads). Divide the latter by the former and multiply by one hundred.
Improvements

Don't forget that the two parameters may need to be adjusted at different times of the day and week to reflect the end-user workload. You can't just derive one pair of figures, and use them for ever.

Happily, in Oracle 9, things have improved. You can now collect system statistics, which are originally included just the four:

+ Average single block read time
+ Average multi block read time
+ Average actual multiblock read
+ Notional usable CPU speed.

Suffice it to say that this feature is worth an article in its own right — but do note that the first three allow Oracle to discover the truth about the cost of multi block reads. And in fact, the CPU speed allows Oracle to work out the CPU cost of unsuitable access mechanisms like reading every single row in a block to find a specific data value and behave accordingly.

When you migrate to version 9, one of the first things you should investigate is the correct use of system statistics. This one feature alone may reduce the amount of time you spend trying to "tune" awkward SQL.

In passing, despite the wonderful effect of system statistics both of the optimizer adjusting parameters still apply — although the exact formula for their use seems to have changed between version 8 and version 9.
Variations on a Theme

Of course, I have picked one very special case — equality on a single column non-unique index, where thare are no nulls in the table — and treated it very simply. (I haven't even mentioned the relevance of the index blevel and clustering_factor yet.) There are numerous different strategies that Oracle uses to work out more general cases.

Consider some of the cases I have conveniently overlooked:

+ Multi-column indexes
+ Part-used multi-column indexes
+ Range scans
+ Unique indexes
+ Non-unique indexes representing unique constraints
+ Index skip scans
+ Index only queries
+ Bitmap indexes
+ Effects of nulls

The list goes on and on. There is no one simple formula that tells you how Oracle works out a cost — there is only a general guideline that gives you the flavour of the approach and a list of different formulae that apply in different cases.

However, the purpose of this article was to make you aware of the general approach and the two assumptions built into the optimiser's strategy. And I hope that this may be enough to take you a long way down the path of understanding the (apparently) strange things that the optimiser has been known to do.

March 11, 2010

How to Troubleshooting Bad Execution Plans

Filed under: [System Performance tuning] — Tags: , — zhefeng @ 11:36 am

Very good sql tuning artical from Greg Rahn

Original Link:

One of the most common performance issues DBAs encounter are bad execution plans. Many try to resolve bad executions plans by setting optimizer related parameters or even hidden underscore parameters. Some even try to decipher a long and complex 10053 trace in hopes to find an answer. While changing parameters or analyzing a 10053 trace might be useful for debugging at some point, I feel there is a much more simple way to start to troubleshoot bad execution plans.

Verify The Query Matches The Business Question

This seems like an obvious thing to do, but I’ve seen numerous cases where the SQL query does not match the business question being asked. Do a quick sanity check verifying things like: join columns, group by, subqueries, etc. The last thing you want to do is consume time trying to debug a bad plan for an improperly written SQL query. Frequently I’ve found that this is the case for many of those “I’ve never got it to run to completion” queries.

What Influences The Execution Plan

I think it’s important to understand what variables influence the Optimizer in order to focus the debugging effort. There are quite a number of variables, but frequently the cause of the problem ones are: (1) non-default optimizer parameters and (2) non-representative object/system statistics. Based on my observations I would say that the most abused Optimizer parameters are:

* OPTIMIZER_INDEX_CACHING
* OPTIMIZER_INDEX_COST_ADJ
* DB_FILE_MULTIBLOCK_READ_COUNT

Many see setting these as a solution to get the Optimizer to choose an index plan over a table scan plan, but this is problematic in several ways:

1. This is a global change to a local problem
2. Although it appears to solve one problem, it is unknown how many bad execution plans resulted from this change
3. The root cause of why the index plan was not chosen is unknown, just that tweaking parameters gave the desired result
4. Using non-default parameters makes it almost impossible to correctly and effectively troubleshoot the root cause

Object and system statistics can have a large influence on execution plans, but few actually take the time to sanity check them during triage. These statistics exist in views like:

* ALL_TAB_COL_STATISTICS
* ALL_PART_COL_STATISTICS
* ALL_INDEXES
* SYS.AUX_STATS$

Using GATHER_PLAN_STATISTICS With DBMS_XPLAN.DISPLAY_CURSOR

As a first step of triage, I would suggest executing the query with a GATHER_PLAN_STATISTICS hint followed by a call to DBMS_XPLAN.DISPLAY_CURSOR. The GATHER_PLAN_STATISTICS hint allows for the collection of extra metrics during the execution of the query. Specifically, it shows us the Optimizer’s estimated number of rows (E-Rows) and the actual number of rows (A-Rows) for each row source. If the estimates are vastly different from the actual, one probably needs to investigate why. For example: In the below plan, look at line 8. The Optimizer estimates 5,899 rows and the row source actually returns 5,479,000 rows. If the estimate is off by three orders of magnitude (1000), chances are the plan will be sub-optimal. Do note that with Nested Loop Joins you need to multiply the Starts column by the E-Rows column to get the A-Rows values (see line 10).
view source
print?
01 select /*+ gather_plan_statistics */ … from … ;
02 select * from table(dbms_xplan.display_cursor(null, null, ‘ALLSTATS LAST’));
03
04 ——————————————————————————————
05 | Id | Operation | Name | Starts | E-Rows | A-Rows |
06 ——————————————————————————————
07 | 1 | SORT GROUP BY | | 1 | 1 | 1 |
08 |* 2 | FILTER | | 1 | | 1728K |
09 | 3 | NESTED LOOPS | | 1 | 1 | 1728K |
10 |* 4 | HASH JOIN | | 1 | 1 | 1728K |
11 | 5 | PARTITION LIST SINGLE | | 1 | 6844 | 3029 |
12 |* 6 | INDEX RANGE SCAN | PROV_IX13 | 1 | 6844 | 3029 |
13 | 7 | PARTITION LIST SINGLE | | 1 | 5899 | 5479K |
14 |* 8 | TABLE ACCESS BY LOCAL INDEX ROWID | SERVICE | 1 | 5899 | 5479K |
15 |* 9 | INDEX SKIP SCAN | SERVICE_IX8 | 1 | 4934 | 5479K |
16 | 10 | PARTITION LIST SINGLE | | 1728K | 1 | 1728K |
17 |* 11 | INDEX RANGE SCAN | CLAIM_IX7 | 1728K | 1 | 1728K |
18 ——————————————————————————————

Using The CARDINALITY Hint

Now that I’ve demonstrated how to compare the cardinality estimates to the actual number of rows, what are the debugging options? If one asserts that the Optimizer will choose the optimal plan if it can accurately estimate the number of rows, one can test using the not so well (un)documented CARDINALITY hint. The CARDINALITY hint tells the Optimizer how many rows are coming out of a row source. The hint is generally used like such:
view source
print?
1 select /*+ cardinality(a 100) */ * from dual a;
2
3 ————————————————————————–
4 | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
5 ————————————————————————–
6 | 0 | SELECT STATEMENT | | 100 | 200 | 2 (0)| 00:00:01 |
7 | 1 | TABLE ACCESS FULL| DUAL | 100 | 200 | 2 (0)| 00:00:01 |
8 ————————————————————————–

In this case I told the Optimizer that DUAL would return 100 rows (when in reality it returns 1 row) as seen in the Rows column from the autotrace output. The CARDINALITY hint is one tool one can use to give the Optimizer accurate information. I usually find this the best way to triage a bad plan as it is not a global change, it only effects a single execution of a statement in my session. If luck has it that using a CARDINALITY hint yields an optimal plan, one can move on to debugging where the cardinality is being miscalculated. Generally the bad cardinality is the result of non-representative table/column stats, but it also may be due to data correlation or other factors. This is where it pays off to know and understand the size and shape of the data. If the Optimizer still chooses a bad plan even with the correct cardinality estimates, it’s time to place a call to Oracle Support as more in-depth debugging is likely required.

Where Cardinality Can Go Wrong

There are several common scenarios that can lead to inaccurate cardinality estimates. Some of those on the list are:

1. Data skew: Is the NDV inaccurate due to data skew and a poor dbms_stats sample?
2. Data correlation: Are two or more predicates related to each other?
3. Out-of-range values: Is the predicate within the range of known values?
4. Use of functions in predicates: Is the 5% cardinality guess for functions accurate?
5. Stats gathering strategies: Is your stats gathering strategy yielding representative stats?

Some possible solutions to these issues are:

1. Data skew: Choose a sample size that yields accurate NDV. Use DBMS_STATS.AUTO_SAMPLE_SIZE in 11g.
2. Data correlation: Use Extended Stats in 11g. If <= 10.2.0.3 use a CARDINALITY hint if possible.
3. Out-of-range values: Gather or manually set the statistics.
4. Use of functions in predicates: Use a CARDINALITY hint where possible.
5. Stats gathering strategies: Use AUTO_SAMPLE_SIZE. Adjust only where necessary. Be mindful of tables with skewed data.

How To Best Work With Oracle Support

If you are unable to get to the root cause on your own, it is likely that you will be in contact with Oracle Support. To best assist the support analyst I would recommend you gather the following in addition to the query text:

1. Output from the GATHER_PLAN_STATISTICS and DBMS_XPLAN.DISPLAY_CURSOR
2. SQLTXPLAN output. See Metalink Note 215187.1
3. 10053 trace output. See Metalink Note 225598.1
4. DDL for all objects used (and dependencies) in the query. This is best gotten as a expdp (data pump) using CONTENT=METADATA_ONLY. This will also include the object statistics.
5. Output from: select pname, pval1 from sys.aux_stats$ where sname='SYSSTATS_MAIN';
6. A copy of your init.ora

Having this data ready before you even make the call (or create the SR on-line) should give you a jump on getting a quick(er) resolution.

Summary

While this blog post is not meant to be a comprehensive troubleshooting guide for bad execution plans, I do hope that it does help point you in the right direction the next time you encounter one. Many of the Optimizer issues I’ve seen are due to incorrect cardinality estimates, quite often due to inaccurate NDV or the result of data correlation. I believe that if you use a systematic approach you will find that debugging bad execution plans may be as easy as just getting the cardinality estimate correct.

March 10, 2010

Using Histograms to Help Oracle Cost-Based Optimizer Make Better Decisions

Filed under: [System Performance tuning] — Tags: , , , — zhefeng @ 5:36 pm

Find a very good article talking about histogram, here is the original link:
http://support.confio.com/blog/tag/methodopt/38/

Introduction

Histograms are a feature of the cost-based optimizer (CBO) that allows the Oracle engine to determine how data is distributed within a column. They are most useful for a column that is included in the WHERE clause of SQL and the data distribution is skewed.

Example

Assume a table named PROCESS_QUEUE with one million rows including a column named PROCESSED_FLAG with five distinct values. Also assume a query similar to the following is executed:

SELECT id, serial_number

FROM process_queue
WHERE processed_flag = ‘N’;

SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1087 Card=260363 Bytes=7029801)
TABLE ACCESS (FULL) OF ‘PROCESS_QUEUE’ (TABLE) (Cost=1087 Card=260363 Bytes=7029801)

Without histograms and only five distinct values, Oracle assumes an even data distribution and would most likely perform a full table scan for this query. With one million rows and five values, Oracle assumes that each value would return 200,000 rows, or 20% of the rows.

Data Skew

However, what if the data for the PROCESSED_FLAG column was skewed:

SELECT processed_flag, COUNT(1)
FROM process_queue
GROUP BY processed_flag;

PROCESSED_FLAG COUNT
——————————- ———-
P 24
Y 999345
E 30
S 568
N 33

In this case, ony 33 rows have a value of ‘N’, so there has to be a way to tell Oracle to use the index on the PROCESSED_FLAG column. That is where histograms come into use. A histogram would include data similar to above and allow Oracle to know that only 33 rows would be returned for this query.

Collecting Histograms

To collect histograms for this column, a command similar to the following could be used:

EXECUTE DBMS_STATS.GATHER_TABLE_STATS(user, ‘PROCESS_QUEUE’, method_opt => ‘for columns processed_flag size 5’)

SELECT id, serial_number
FROM process_queue
WHERE processed_flag = ‘N’;

SELECT STATEMENT Optimizer=ALL_ROWS (Cost=1 Card=28 Bytes=756)
TABLE ACCESS (BY INDEX ROWID) OF ‘PROCESS_QUEUE’ (TABLE) (Cost=1 Card=28 Bytes=756)
INDEX (RANGE SCAN) OF ‘PQ_IX1’ (INDEX) (Cost=1 Card=28)

Notes About Histograms

Note 1: Using histograms works best for SQL statements that use literal values. If a statement uses a bind variable, the first time the query is parsed, Oracle will peek at the value of the bind variable and choose a plan accordingly. That same plan will be used until the SQL is reparsed. In this case, if the bind variable was ‘Y’ the first time, Oracle may perform a full table scan for this query no matter what value was passed in from then on.

The opposite may also be true. Assume a similar data distribution to above but with 100 distinct values for the PROCESSED_FLAG column. The rows that have a ‘Y’ value are still be 95% of the rows. However, if you used the criteria “WHERE processed_flag=’Y'”, without histograms Oracle may decide to use the index when a full table scan may be a better option.

Note 2: The defaults for the METHOD_OPT parameter changed between Oracle 9i and 10g. In 9i the parameter defaulted to ‘for all columns size 1’ which essentially turns off histograms. The default value in Oracle 10g is ‘for all columns size auto’ which means that Oracle will decide whether or not to collect histograms for a column. In my experience it seems that unneccesary histograms are collected and histogram data is not collected for some columns where it would be useful.

Conclusion

Histograms allow Oracle to make much better performance decisions. The case we discussed in this article is one way that histograms are used and is commonly referred to as “table access method” histograms. Another use for histograms, referred to as “table order join” histograms, is to help Oracle decide the order in which tables will be joined. This helps the CBO know the size of the result sets or “cardinality” to properly determine the correct order in which to do joins.

March 8, 2010

Index Full Scan vs Index Fast Full Scan

Filed under: [System Performance tuning] — Tags: , , , — zhefeng @ 2:06 pm

http://spaces.msn.com/members/wzwanghai/

[Oracle] Index Full Scan vs Index Fast Full Scan
作者:汪海 (Wanghai)
日期:14-Aug-2005 
出处:http://spaces.msn.com/members/wzwanghai/

Index Full Scan vs Index Fast Full Scan

index full scan和index fast full scan是指同样的东西吗?答案是no。两者虽然从字面上看起来差不多,但是实现的机制完全不同。我们一起来看看两者的区别在哪里?

首先来看一下IFS,FFS能用在哪里:在一句sql中,如果我们想搜索的列都包含在索引里面的话,那么index full scan 和 index fast full scan 都可以被采用代替full table scan。比如以下语句:

SQL> CREATE TABLE TEST AS SELECT * FROM dba_objects WHERE 0=1;

SQL> CREATE INDEX ind_test_id ON TEST(object_id);

SQL> INSERT INTO TEST
SELECT *
FROM dba_objects
WHERE object_id IS NOT NULL AND object_id > 10000
ORDER BY object_id DESC;

17837 rows created.

SQL> analyze table test compute statistics for table for all columns for all indexes;

Table analyzed.

SQL> set autotrace trace;

SQL> select object_id from test;

17837 rows selected.

Execution Plan
———————————————————-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=68 Card=17837 Bytes=71348)
1 0 TABLE ACCESS (FULL) OF ‘TEST’ (Cost=68 Card=17837 Bytes=71348)

这时候 Oracle会选择全表扫描,因为 object_id 列默认是可以为null的,来修改成 not null:

SQL>alter table test modify(object_id not null);

SQL> select object_id from test;

17837 rows selected.

Execution Plan
———————————————————-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=11 Card=17837 Bytes=71348)
1 0 INDEX (FAST FULL SCAN) OF ‘IND_TEST_ID’ (NON-UNIQUE) (Cost=11 Card=17837 Bytes=71348)

当然我们也可以使用index full scan:

SQL> select/*+ index(test ind_TEST_ID)*/ object_id from test;

17837 rows selected.

Execution Plan
———————————————————-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=41 Card=17837 Bytes=71348)
1 0 INDEX (FULL SCAN) OF ‘IND_TEST_ID’ (NON-UNIQUE) (Cost=101 Card=17837 Bytes=71348)

我们看到了两者都可以在这种情况下使用,那么他们有什么区别呢?有个地方可以看出两者的区别, 来看一下两者的输出结果,为了让大家看清楚一点,我们只取10行。

INDEX FAST FULL SCAN

SQL> select object_id from test where rownum select/*+ index(test ind_TEST_ID)*/ object_id from test where rownum select object_id from dba_objects where object_name=’IND_TEST_ID’;

OBJECT_ID
———-
70591

索引的object_id为70591,使用tree dump可以看到索引树的结构

SQL> ALTER SESSION SET EVENTS ‘immediate trace name TREEDUMP level 70591’;

—– begin tree dump
branch: 0x6809b8d 109091725 (0: nrow: 100, level: 1)
leaf: 0x6809b96 109091734 (-1: nrow: 294 rrow: 0)
leaf: 0x6c07ec1 113278657 (0: nrow: 262 rrow: 0)
leaf: 0x6c07ebd 113278653 (1: nrow: 518 rrow: 0)
leaf: 0x6c07eb1 113278641 (2: nrow: 524 rrow: 0)
leaf: 0x6c07ead 113278637 (3: nrow: 524 rrow: 0)
leaf: 0x6c07ea9 113278633 (4: nrow: 524 rrow: 0)
leaf: 0x6c07ea5 113278629 (5: nrow: 524 rrow: 0)
leaf: 0x6c07ea1 113278625 (6: nrow: 524 rrow: 0)
leaf: 0x6c07e9d 113278621 (7: nrow: 524 rrow: 0)
leaf: 0x6c07e99 113278617 (8: nrow: 524 rrow: 0)
leaf: 0x6c07e95 113278613 (9: nrow: 532 rrow: 0)
leaf: 0x6c07e91 113278609 (10: nrow: 524 rrow: 0)
leaf: 0x6c07e8d 113278605 (11: nrow: 524 rrow: 0)
leaf: 0x6c07ec8 113278664 (12: nrow: 524 rrow: 0)
leaf: 0x6c07ec4 113278660 (13: nrow: 524 rrow: 0)
leaf: 0x6c07ec0 113278656 (14: nrow: 524 rrow: 0)
leaf: 0x6c07ebc 113278652 (15: nrow: 524 rrow: 0)
leaf: 0x6809bb2 109091762 (16: nrow: 524 rrow: 0)
leaf: 0x6c07eb8 113278648 (17: nrow: 524 rrow: 0)
leaf: 0x6c07eb4 113278644 (18: nrow: 524 rrow: 0)
leaf: 0x6c07eb0 113278640 (19: nrow: 524 rrow: 0)
leaf: 0x6c07eac 113278636 (20: nrow: 524 rrow: 0)
leaf: 0x6809bae 109091758 (21: nrow: 524 rrow: 0)
leaf: 0x6c07ea8 113278632 (22: nrow: 524 rrow: 0)
leaf: 0x6c07ea4 113278628 (23: nrow: 524 rrow: 0)
leaf: 0x6c07ea0 113278624 (24: nrow: 105 rrow: 105)
leaf: 0x6c07e9c 113278620 (25: nrow: 129 rrow: 129)
leaf: 0x6c07eb9 113278649 (26: nrow: 123 rrow: 123)
leaf: 0x6809baa 109091754 (27: nrow: 246 rrow: 246)
leaf: 0x6c07e98 113278616 (28: nrow: 246 rrow: 246)
leaf: 0x6c07e94 113278612 (29: nrow: 246 rrow: 246)
leaf: 0x6809ba6 109091750 (30: nrow: 246 rrow: 246)
leaf: 0x6809bce 109091790 (31: nrow: 246 rrow: 246)
leaf: 0x6809bca 109091786 (32: nrow: 246 rrow: 246)
leaf: 0x6809c05 109091845 (33: nrow: 248 rrow: 248)
leaf: 0x6809c01 109091841 (34: nrow: 246 rrow: 246)
leaf: 0x6809bfd 109091837 (35: nrow: 246 rrow: 246)
leaf: 0x6809bf9 109091833 (36: nrow: 246 rrow: 246)
leaf: 0x6809bf5 109091829 (37: nrow: 246 rrow: 246)
leaf: 0x6809bf1 109091825 (38: nrow: 246 rrow: 246)
leaf: 0x6809bed 109091821 (39: nrow: 246 rrow: 246)
leaf: 0x6809be9 109091817 (40: nrow: 246 rrow: 246)
leaf: 0x6809be5 109091813 (41: nrow: 246 rrow: 246)
leaf: 0x6809be1 109091809 (42: nrow: 246 rrow: 246)
leaf: 0x6809bdd 109091805 (43: nrow: 246 rrow: 246)
leaf: 0x6809bd9 109091801 (44: nrow: 246 rrow: 246)
leaf: 0x6809bd5 109091797 (45: nrow: 246 rrow: 246)
leaf: 0x6809bd1 109091793 (46: nrow: 248 rrow: 248)
leaf: 0x6809bcd 109091789 (47: nrow: 246 rrow: 246)
leaf: 0x6809bc9 109091785 (48: nrow: 246 rrow: 246)
leaf: 0x6809c08 109091848 (49: nrow: 246 rrow: 246)
leaf: 0x6809c04 109091844 (50: nrow: 246 rrow: 246)
leaf: 0x6809c00 109091840 (51: nrow: 246 rrow: 246)
leaf: 0x6809bfc 109091836 (52: nrow: 246 rrow: 246)
leaf: 0x6809bf8 109091832 (53: nrow: 246 rrow: 246)
leaf: 0x6809bf4 109091828 (54: nrow: 246 rrow: 246)
leaf: 0x6809bf0 109091824 (55: nrow: 246 rrow: 246)
leaf: 0x6809bec 109091820 (56: nrow: 246 rrow: 246)
leaf: 0x6809be8 109091816 (57: nrow: 246 rrow: 246)
leaf: 0x6809be4 109091812 (58: nrow: 246 rrow: 246)
leaf: 0x6809be0 109091808 (59: nrow: 248 rrow: 248)
leaf: 0x6809bdc 109091804 (60: nrow: 246 rrow: 246)
leaf: 0x6809bd8 109091800 (61: nrow: 246 rrow: 246)
leaf: 0x6809bd4 109091796 (62: nrow: 246 rrow: 246)
leaf: 0x6809bd0 109091792 (63: nrow: 246 rrow: 246)
leaf: 0x6809bcc 109091788 (64: nrow: 246 rrow: 246)
leaf: 0x6809c07 109091847 (65: nrow: 246 rrow: 246)
leaf: 0x6809c03 109091843 (66: nrow: 246 rrow: 246)
leaf: 0x6809bff 109091839 (67: nrow: 246 rrow: 246)
leaf: 0x6809bfb 109091835 (68: nrow: 246 rrow: 246)
leaf: 0x6809bf7 109091831 (69: nrow: 246 rrow: 246)
leaf: 0x6809bf3 109091827 (70: nrow: 246 rrow: 246)
leaf: 0x6809bef 109091823 (71: nrow: 246 rrow: 246)
leaf: 0x6809beb 109091819 (72: nrow: 248 rrow: 248)
leaf: 0x6809be7 109091815 (73: nrow: 246 rrow: 246)
leaf: 0x6809be3 109091811 (74: nrow: 246 rrow: 246)
leaf: 0x6809bdf 109091807 (75: nrow: 246 rrow: 246)
leaf: 0x6809bdb 109091803 (76: nrow: 246 rrow: 246)
leaf: 0x6809bd7 109091799 (77: nrow: 246 rrow: 246)
leaf: 0x6809bd3 109091795 (78: nrow: 246 rrow: 246)
leaf: 0x6809bcf 109091791 (79: nrow: 246 rrow: 246)
leaf: 0x6809bcb 109091787 (80: nrow: 246 rrow: 246)
leaf: 0x6809c06 109091846 (81: nrow: 246 rrow: 246)
leaf: 0x6809c02 109091842 (82: nrow: 246 rrow: 246)
leaf: 0x6809bfe 109091838 (83: nrow: 246 rrow: 246)
leaf: 0x6809bfa 109091834 (84: nrow: 246 rrow: 246)
leaf: 0x6809ba2 109091746 (85: nrow: 129 rrow: 129)
leaf: 0x6c07eb5 113278645 (86: nrow: 123 rrow: 123)
leaf: 0x6809bf6 109091830 (87: nrow: 246 rrow: 246)
leaf: 0x6809bf2 109091826 (88: nrow: 246 rrow: 246)
leaf: 0x6809bee 109091822 (89: nrow: 246 rrow: 246)
leaf: 0x6809bea 109091818 (90: nrow: 246 rrow: 246)
leaf: 0x6809b9e 109091742 (91: nrow: 246 rrow: 246)
leaf: 0x6809be6 109091814 (92: nrow: 246 rrow: 246)
leaf: 0x6809be2 109091810 (93: nrow: 246 rrow: 246)
leaf: 0x6809bde 109091806 (94: nrow: 246 rrow: 246)
leaf: 0x6809bda 109091802 (95: nrow: 246 rrow: 246)
leaf: 0x6809b9a 109091738 (96: nrow: 246 rrow: 246)
leaf: 0x6809bd6 109091798 (97: nrow: 246 rrow: 246)
leaf: 0x6809bd2 109091794 (98: nrow: 246 rrow: 246)
—– end tree dump

index full scan读取的是0x6c07ea0 这个块,而index fast full scan读取的是 0x6809b9a这个块也就是包含数据的物理存储位置最前的块。分别看一下这两个块的内容
0x6c07ea0 =十进制的113278624
0x6809b9a =十进制的109091738

SQL> select dbms_utility.data_block_address_file(113278624) “file”,dbms_utility.data_block_address_block(113278624) “block” from dual;

file block
———- ———-
27 32416

SQL> select dbms_utility.data_block_address_file(109091738) “file”,dbms_utility.data_block_address_block(109091738)”block” from dual;

file block
———- ———-
26 39834

SQL> alter system dump datafile 27 block 32416;

SQL> alter system dump datafile 26 block 39834;

block 32416的前10行

row#0[6564] flag: —–, lock: 2
col 0; len 4; (4): c3 02 07 11
col 1; len 6; (6): 07 00 7c 20 00 2b
row#1[6578] flag: —–, lock: 2
col 0; len 4; (4): c3 02 16 4e
col 1; len 6; (6): 07 00 7c 20 00 2a
row#2[6592] flag: —–, lock: 2
col 0; len 4; (4): c3 02 16 4f
col 1; len 6; (6): 07 00 7c 20 00 29
row#3[6606] flag: —–, lock: 2
col 0; len 4; (4): c3 02 16 50
col 1; len 6; (6): 07 00 7c 20 00 28
row#4[6620] flag: —–, lock: 2
col 0; len 4; (4): c3 02 18 02
col 1; len 6; (6): 07 00 7c 20 00 27
row#5[6634] flag: —–, lock: 2
col 0; len 4; (4): c3 02 23 60
col 1; len 6; (6): 07 00 7c 20 00 26
row#6[6648] flag: —–, lock: 2
col 0; len 4; (4): c3 02 24 25
col 1; len 6; (6): 07 00 7c 20 00 25
row#7[6662] flag: —–, lock: 2
col 0; len 4; (4): c3 02 24 28
col 1; len 6; (6): 07 00 7c 20 00 24
row#8[6676] flag: —–, lock: 2
col 0; len 4; (4): c3 02 28 18
col 1; len 6; (6): 07 00 7c 20 00 23
row#9[6690] flag: —–, lock: 2
col 0; len 4; (4): c3 02 42 04
col 1; len 6; (6): 07 00 7c 20 00 22

block 39834的前10行
row#0[4591] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 43
col 1; len 6; (6): 02 81 71 f6 00 36
row#1[4605] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 44
col 1; len 6; (6): 02 81 71 f6 00 35
row#2[4619] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 45
col 1; len 6; (6): 02 81 71 f6 00 34
row#3[4633] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 46
col 1; len 6; (6): 02 81 71 f6 00 33
row#4[4647] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 47
col 1; len 6; (6): 02 81 71 f6 00 32
row#5[4661] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 48
col 1; len 6; (6): 02 81 71 f6 00 31
row#6[4675] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 49
col 1; len 6; (6): 02 81 71 f6 00 30
row#7[4689] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 4a
col 1; len 6; (6): 02 81 71 f6 00 2f
row#8[4703] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 4b
col 1; len 6; (6): 02 81 71 f6 00 2e
row#9[4717] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 4c
col 1; len 6; (6): 02 81 71 f6 00 2d

对照一下前面的结果集
block 32416的第一行为10616,数据内的存储格式应该为

SQL> select dump(10616,16) from dual;

DUMP(10616,16)
———————-
Typ=2 Len=4: c3,2,7,11

确实等于dump block所看到的

row#0[6564] flag: —–, lock: 2
col 0; len 4; (4): c3 02 07 11
col 1; len 6; (6): 07 00 7c 20 00 2b

再看block 39834的第1行

SQL> select dump(66266,16) from dual;

DUMP(66266,16)
———————–
Typ=2 Len=4: c3,7,3f,43

跟dump 的结果也一样

row#0[4591] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 43
col 1; len 6; (6): 02 81 71 f6 00 36

这就证明了上面所说的index full scan和index fast full scan的不同。
我们也可以用10046事件去跟踪两者走的路径。

SQL> ALTER SESSION SET EVENTS ‘immediate trace name flush_cache’;

(清空buffer cache,以便观看’db file sequential read’,’db file scattered read’事件)。

SQL> alter session set events’10046 trace name context forever,level 12′;

Session altered.

SQL> select object_id from test where rownum alter session set events’10046 trace name context off’;

Session altered.

[oracle@csdbc udump]$ grep read cs-dbc_ora_15596.trc

Redo thread mounted by this instance: 1
WAIT #1: nam=’db file sequential read’ ela= 33 p1=26 p2=39820 p3=1
WAIT #1: nam=’db file sequential read’ ela= 21 p1=26 p2=39817 p3=1
WAIT #1: nam=’db file sequential read’ ela= 17 p1=26 p2=39819 p3=1
WAIT #1: nam=’db file parallel read’ ela= 53 p1=2 p2=2 p3=2
WAIT #1: nam=’db file scattered read’ ela= 466 p1=26 p2=39821 p3=16

最前面的’db file sequential read’是由于读段头等操作,我们来关注’db file scattered read’事件,因为index fast full scan是采用多块读,从39821开始读取db_file_multiblock_read_count个块(本例里设置为16)。我们关心的 39834块正位于其中。
再来看index full scan的10046 trace

SQL> ALTER SESSION SET EVENTS ‘immediate trace name flush_cache’;

(清空buffer cache,以便观看’db file sequential read’,’db file scattered read’事件)。

SQL> alter session set events’10046 trace name context forever,level 12′;

Session altered.

SQL>

OBJECT_ID
———-
10616
12177
12178
12179
12301
13495
13536
13539
13923
16503

10 rows selected.

SQL> alter session set events’10046 trace name context off’;

Session altered.

[oracle@csdbc udump]$ grep read cs-dbc_ora_15609.trc

Redo thread mounted by this instance: 1
WAIT #1: nam=’db file sequential read’ ela= 49 p1=26 p2=39821 p3=1
root block,正是先前索引树dump里面的 0x6809b8d
WAIT #1: nam=’db file sequential read’ ela= 32 p1=26 p2=39830 p3=1
WAIT #1: nam=’db file sequential read’ ela= 40 p1=27 p2=32449 p3=1
WAIT #1: nam=’db file sequential read’ ela= 35 p1=27 p2=32445 p3=1
WAIT #1: nam=’db file sequential read’ ela= 28 p1=27 p2=32433 p3=1
WAIT #1: nam=’db file sequential read’ ela= 19 p1=27 p2=32429 p3=1
WAIT #1: nam=’db file sequential read’ ela= 34 p1=27 p2=32425 p3=1
WAIT #1: nam=’db file sequential read’ ela= 32 p1=27 p2=32421 p3=1
WAIT #1: nam=’db file sequential read’ ela= 33 p1=27 p2=32417 p3=1
WAIT #1: nam=’db file sequential read’ ela= 29 p1=27 p2=32413 p3=1
WAIT #1: nam=’db file sequential read’ ela= 37 p1=27 p2=32409 p3=1
WAIT #1: nam=’db file sequential read’ ela= 32 p1=27 p2=32405 p3=1
WAIT #1: nam=’db file sequential read’ ela= 35 p1=27 p2=32401 p3=1
WAIT #1: nam=’db file sequential read’ ela= 34 p1=27 p2=32397 p3=1
WAIT #1: nam=’db file sequential read’ ela= 31 p1=27 p2=32456 p3=1
WAIT #1: nam=’db file sequential read’ ela= 29 p1=27 p2=32452 p3=1
WAIT #1: nam=’db file sequential read’ ela= 31 p1=27 p2=32448 p3=1
WAIT #1: nam=’db file sequential read’ ela= 30 p1=27 p2=32444 p3=1
WAIT #1: nam=’db file sequential read’ ela= 38 p1=26 p2=39858 p3=1
WAIT #1: nam=’db file sequential read’ ela= 31 p1=27 p2=32440 p3=1
WAIT #1: nam=’db file sequential read’ ela= 32 p1=27 p2=32436 p3=1
WAIT #1: nam=’db file sequential read’ ela= 35 p1=27 p2=32432 p3=1
WAIT #1: nam=’db file sequential read’ ela= 31 p1=27 p2=32428 p3=1
WAIT #1: nam=’db file sequential read’ ela= 29 p1=26 p2=39854 p3=1
WAIT #1: nam=’db file sequential read’ ela= 36 p1=27 p2=32424 p3=1
WAIT #1: nam=’db file sequential read’ ela= 32 p1=27 p2=32420 p3=1
WAIT #1: nam=’db file sequential read’ ela= 36 p1=27 p2=32416 p3=1

index full scan走的路径正是文章开始所提到的定位到root block,然后根据leaf block链表一路读取块。看到这里大家应该比较了解index full scan 和index fast full scan的区别了,最后补充一下 index full scan 和 index fast full scan 在排序上的不同。

SQL> set autotrace trace;

SQL> select object_id from test order by object_id;

17837 rows selected.

Execution Plan
———————————————————-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=41 Card=17837 Bytes=71348)
1 0 INDEX (FULL SCAN) OF ‘IND_TEST_ID’ (NON-UNIQUE) (Cost=101 Card=17837 Bytes=71348)

由于有排序所以oracle自动选择了index full scan避免了排序。那么强制用index fast full scan呢?

SQL> select/*+ index_ffs(test ind_test_id)*/object_id from test order by object_id;
17837 rows selected.

Execution Plan
———————————————————-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=59 Card=17837 Bytes=71348)
1 0 SORT (ORDER BY) (Cost=59 Card=17837 Bytes=71348)
2 1 INDEX (FAST FULL SCAN) OF ‘IND_TEST_ID’ (NON-UNIQUE) (Cost=11 Card=17837 Bytes=71348)

index fast full scan会多一步sort order by,相信仔细看过这篇文章的人能知道其中结果了吧,还不知道的人请在文章中自己找答案吧。

July 27, 2009

two ways to call Oracle Stored procedure from VB.net

Filed under: [PL/SQL dev&tuning] — Tags: , , , — zhefeng @ 10:05 pm

PROCEDURE TEST_C(temp out varchar2,a IN varchar2, b in varchar2)
IS
BEGIN
temp:=a || b;
END;

Solution 1:
add “Imports System.Data.OleDb” at the code beginning

Dim dbConn As New OleDbConnection
Dim dbComm As OleDbCommand

dbConn.ConnectionString = “Provider=MSDAORA;User ID=xxx;Password=xxx;Data Source=xxx;”
dbConn.Open()
dbComm = dbConn.CreateCommand

dbComm.Parameters.Add(“temp”, OleDbType.VarChar, 30).Direction = ParameterDirection.Output
dbComm.Parameters.Add(“a”, OleDbType.VarChar, 30).Direction = ParameterDirection.Input
dbComm.Parameters(“a”).Value = “test ”
dbComm.Parameters.Add(“b”, OleDbType.VarChar, 30).Direction = ParameterDirection.Input
dbComm.Parameters(“b”).Value = “OK”

dbComm.CommandText = “TEST_C”
dbComm.CommandType = CommandType.StoredProcedure
dbComm.ExecuteNonQuery()
dbConn.Close()

MessageBox.Show(dbComm.Parameters(“temp”).Value)

Solution 2:
add “Imports System.Data.OracleClient” at the code beginning

Dim oraConn As New OracleConnection
Dim oraComm As New OracleCommand

oraConn.ConnectionString = “Data Source=xxx;User Id=xxx;Password=xxx”
oraComm.Connection = oraConn

oraComm.Parameters.Add(“temp”, OracleType.VarChar, 10).Direction = ParameterDirection.Output
oraComm.Parameters.Add(“a”, OracleType.VarChar, 10).Direction = ParameterDirection.Input
oraComm.Parameters(“a”).Value = “test ”
oraComm.Parameters.Add(“b”, OracleType.VarChar, 10).Direction = ParameterDirection.Input
oraComm.Parameters(“b”).Value = “OK”

oraConn.Open()
oraComm.CommandText = “TEST_C”
oraComm.CommandType = CommandType.StoredProcedure
oraComm.ExecuteNonQuery()
oraConn.Close()

MessageBox.Show(oraComm.Parameters(“temp”).Value)

Note: the first parameter name has to be the same as the oracle stored procedure parameter name;
if there is dblink in the stored procedure, then the solution 1 is the only choice.

if you are trying to call oracle stored procedure by passing clob/lob as parameters, then don’t use odbc solution because it has 32k limitation. Oracle have this metalink note talk about this:
From metalink: Subject-32k Limitation When Passing LOB Parameter Through Stored Procedure Doc ID: 252102.1
https://metalink2.oracle.com/metalink/plsql/f?p=130:14:6916025231277951933::::p14_database_id,p14_docid,p14_show_header,p14_show_help,p14_black_frame,p14_font:NOT,126125.1,1,0,1,helvetica

Workaround
~~~~~~~~~~

Workaround 1:

Use the Oracle Provider for OLEDB instead, making sure to set the command
property SPPrmsLOB to TRUE:

objCmd.Properties(“SPPrmsLOB”) = TRUE

Workaround 2:

Instead of passing a CLOB as a parameter to a stored procedure, use a method
that directly interfaces with the database and does not require the use of
a stored procedure to update the CLOBs as in the following example:

Note 126125.1 – ADO Streaming BLOB & CLOB Example Using ODBC and OLEDB in VB (SCR 1388).

March 30, 2009

keep data consistency when using oracle exp/expdp

Filed under: [backup and recovery] — Tags: , , , , — zhefeng @ 1:25 pm

When we were using old oracle exp, we usually will set exp consistent=y (default is n) to ensure the data consistency (the image taken of the data in the tables being exported represents the committed state of the table data at the same single point-in-time for all of the tables being exported.)

However, started from 10g they decomission this parameter. Today i happened to have a requirement for this and i searched meta link and found this useful piece:

The versions 10gR1 and 10gR2 additionally put the message in the expdp header:

FLASHBACK automatically enabled to preserve database integrity

Does this guarantee export consistency to a single point of time?

Cause
The message:

FLASHBACK automatically enabled to preserve database integrity

only means that some of the tables will be assigned special SCNs (needed for Streams and Logical Standby). There is no consistency guaranteed between exported tables.

The next example demonstrates this:

1. Create the environment

connect / as sysdba

create or replace directory flash as ‘/tmp’;
grant read, write on directory flash to system;

drop user usr001 cascade;
purge dba_recyclebin;

create user usr001 identified by usr001 default tablespace users temporary tablespace temp;
grant connect, resource to usr001;

connect usr001/usr001
create table part001
(
col001 number,
col002 date,
col003 varchar2(1000)
)
partition by range (col001)
(
partition p001 values less than (500001),
partition p002 values less than (1000001)
);

2. Populate the partitioned table: partition P001 contains 500000 rows and partition P002 contains 10 rows

connect usr001/usr001
begin
for i in 1..500010 loop
insert into part001 values (i, sysdate, lpad (to_char(i), 1000, ‘0’));
end loop;
commit;
end;

3. Start expd

#> expdp system/passwd directory=flsh dumpfile=user001_1.dmp logfile =user001_1.log schemas=usr001

4. During point 3. is running, run in a separate session:

connect usr001/usr001
delete from part001 where col001 in (500001, 500002, 500003, 500004, 500005);
commit;

This will delete 5 rows in partition P002.

5. Expdp completes with:

Export: Release 10.2.0.3.0 – 64bit Production on Friday, 05 September, 2008 13:59:59

Copyright (c) 2003, 2005, Oracle. All rights reserved.
;;;
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 – 64bit Production
With the Partitioning, Oracle Label Security, OLAP and Data Mining Scoring Engine options
FLASHBACK automatically enabled to preserve database integrity.
Starting “SYSTEM”.”SYS_EXPORT_SCHEMA_02″: system/******** directory=flash dumpfile=usr001_1.dmp logfile=exp_usr001_1.log schemas=usr001
Estimate in progress using BLOCKS method…
Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 568.0 MB
Processing object type SCHEMA_EXPORT/USER
Processing object type SCHEMA_EXPORT/SYSTEM_GRANT
Processing object type SCHEMA_EXPORT/ROLE_GRANT
Processing object type SCHEMA_EXPORT/DEFAULT_ROLE
Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA
Processing object type SCHEMA_EXPORT/TABLE/TABLE
. . exported “USR001”.”PART001″:”P001″ 486.3 MB 500000 rows
. . exported “USR001”.”PART001″:”P002″ 10.50 KB 5 rows
Master table “SYSTEM”.”SYS_EXPORT_SCHEMA_02″ successfully loaded/unloaded
******************************************************************************
Dump file set for SYSTEM.SYS_EXPORT_SCHEMA_02 is:
/media/usbdisk/TESTS/FLASH/usr001_1.dmp
Job “SYSTEM”.”SYS_EXPORT_SCHEMA_02″ successfully completed at 14:02:47

=> From partition P002 only 5 rows were exported, so the written export dump is not consistent.
Solution
To generate consistent Data Pump’s database or schema export similar to exports generated with exp parameter CONSISTENT=Y, use Data Pump parameters FLASHBACK_SCN and FLASHBACK_TIME for this functionality.

Conform with the example above, running expdp with:

#> expdp system/passwd directory=flsh dumpfile=user001_2.dmp logfile =user001_2.log schemas=usr001 flashback_time=\”TO_TIMESTAMP \(TO_CHAR \(SYSDATE, \’YYYY-MM-DD HH24:MI:SS\’\), \’YYYY-MM-DD HH24:MI:SS\’\)\”

This will end with:

Export: Release 10.2.0.3.0 – 64bit Production on Friday, 05 September, 2008 14:15:38

Copyright (c) 2003, 2005, Oracle. All rights reserved.
;;;
Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 – 64bit Production
With the Partitioning, Oracle Label Security, OLAP and Data Mining Scoring Engine options
FLASHBACK automatically enabled to preserve database integrity.
Starting “SYSTEM”.”SYS_EXPORT_SCHEMA_02″: system/******** directory=flash dumpfile=usr001_2.dmp logfile=exp_usr001_2.log schemas=usr001 flashback_time=”to_timestamp (to_char (sysdate, ‘YYYY-MM-DD HH24:MI:SS’), ‘YYYY-MM-DD HH24:MI:SS’)”
Estimate in progress using BLOCKS method…
Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA
Total estimation using BLOCKS method: 568.0 MB
Processing object type SCHEMA_EXPORT/USER
Processing object type SCHEMA_EXPORT/SYSTEM_GRANT
Processing object type SCHEMA_EXPORT/ROLE_GRANT
Processing object type SCHEMA_EXPORT/DEFAULT_ROLE
Processing object type SCHEMA_EXPORT/PRE_SCHEMA/PROCACT_SCHEMA
Processing object type SCHEMA_EXPORT/TABLE/TABLE
. . exported “USR001”.”PART001″:”P001″ 486.3 MB 500000 rows
. . exported “USR001”.”PART001″:”P002″ 15.48 KB 10 rows
Master table “SYSTEM”.”SYS_EXPORT_SCHEMA_02″ successfully loaded/unloaded
******************************************************************************
Dump file set for SYSTEM.SYS_EXPORT_SCHEMA_02 is:
/media/usbdisk/TESTS/FLASH/usr001_2.dmp
Job “SYSTEM”.”SYS_EXPORT_SCHEMA_02″ successfully completed at 14:16:55

=> Partition P002 contains all 10 rows, though 5 rows were deleted during expdp time. The parameter FLASHBACK_TIME guarantees the consistency.

Link:  “Doc ID: 377218.1”

https://metalink2.oracle.com/metalink/plsql/f?p=130:14:4374876471460387797::::p14_database_id,p14_docid,p14_show_header,p14_show_help,p14_black_frame,p14_font:NOT,377218.1,1,0,1,helvetica

Blog at WordPress.com.