DBA Sensation

March 12, 2010

Why Isn’t Oracle Using My Index?!

Filed under: [System Performance tuning] — Tags: , , , — zhefeng @ 4:02 pm

By Jonathan Lewis
http://www.dbazine.com/oracle/or-articles/jlewis12

The question in the title of this piece is probably the single most frequently occurring question that appears in the Metalink forums and Usenet newsgroups. This article uses a test case that you can rebuild on your own systems to demonstrate the most fundamental issues with how cost-based optimisation works. And at the end of the article, you should be much better equipped to give an answer the next time you hear that dreaded question.

Because of the wide variety of options that are available when installing Oracle, it isn’t usually safe to predict exactly what will happen when someone runs a script that you have dictated to them. But I’m going to risk it, in the hope that your database is a fairly vanilla installation, with the default values for the mostly commonly tweaked parameters. The example has been built and tested on an 8.1.7 database with the db_block_size set to the commonly used value of 8K and the db_file_multiblock_read_count set to the equally commonly used value 8. The results may be a little different under Oracle 9.2

Run the script from Figure 1, which creates a couple of tables, then indexes and analyses them.

create table t1 as
select
trunc((rownum-1)/15) n1,
trunc((rownum-1)/15) n2,
rpad(‘x’, 215) v1
from all_objects<
where rownum <= 3000;

create table t2 as
select
mod(rownum,200) n1,
mod(rownum,200) n2,
rpad('x',215) v1
from all_objects
where rownum <= 3000;

create index t1_i1 on t1(N1);
create index t2_i1 on t2(n1);

analyze table t1 compute
statistics;
analyze table t2 compute
statistics;

Figure 1: The test data sets.

Once you have got this data in place, you might want to convince yourself that the two sets of data are identical — in particular, that the N1 columns in both data sets have values ranging from 0 to 199, with 15 occurrences of each value. You might try the following check:

select n1, count(*)
from t1
group by n1;

and the matching query against T2 to prove the point.

If you then execute the queries:

select * from t1 where n1 = 45;
select * from t2 where n1 = 45;

You will find that each query returns 15 rows. However if you

set autotrace traceonly explain

you will discover that the two queries have different execution paths.

The query against table T1 uses the index, but the query against table T2 does a full tablescan.

So you have two sets of identical data, with dramatically different access paths for the same query.
What Happened to the Index?

Note: if you've ever come across any of those "magic number" guidelines regarding the use of indexes, e.g., "Oracle will use an index for less than 23 percent, 10 percent, 2 percent (pick number at random) of the data," then you may at this stage begin to doubt their validity. In this example, Oracle has used a tablescan for 15 rows out of 3,000, i.e., for just one half of one percent of the data!

To investigate problems like this, there is one very simple ploy that I always try as the first step: Put in some hints to make Oracle do what I think it ought to be doing, and see if that gives me any clues.

In this case, a simple hint:

/*+ index(t2, t2_i1) */

is sufficient to switch Oracle from the full tablescan to the indexed access path. The three paths with costs (abbreviated to C=nnn) are shown in Figure 2:

select * from t1 where n1 = 45;

EXECUTION PLAN
————–
TABLE ACCESS BY INDEX ROWID OF T1 (C=2)
INDEX(RANGE SCAN) OF T1_I1 (C=1)

select * from t2 where n1 = 45;

EXECUTION PLAN
————–
TABLE ACCESS FULL OF T2 (C=15)

select /*+ index(t2 t2_i1) */
*
from t1
where n1 = 45;

EXECUTION PLAN
————–
TABLE ACCESS BY INDEX ROWID OF T2 (C=16)
INDEX(RANGE SCAN) OF T2_I1 (C=1)

Figure 2: The different queries and their costs.

So why hasn't Oracle used the index by default in for the T2 query? Easy — as the execution plan shows, the cost of doing the tablescan is cheaper than the cost of using the index.
Why is the Tablescan Cheaper?

This, of course, is simply begging the question. Why is the cost of the tablescan cheaper than the cost of using the index?

By looking into this question, you uncover the key mechanisms (and critically erroneous assumptions) of the Cost Based Optimiser.

Let's start by examining the indexes by running the query:

select
table_name,
blevel,
avg_data_blocks_per_key,
avg_leaf_blocks_per_key,
clustering_factor
from user_indexes;

The results are given in the table below:
T1 T2
Blevel 1 1
Data block / key 1 15
Leaf block / key 1 1
Clustering factor 96 3000

Note particularly the value for "data blocks per key." This is the number of different blocks in the table that Oracle thinks it will have to visit if you execute a query that contains an equality test on a complete key value for this index.

So where do the costs for our queries come from? As far as Oracle is concerned, if we fire in the key value 45, we get the data from table T1 by hitting one index leaf block and one table block — two blocks, so a cost of two.

If we try the same with table T2, we have to hit one index leaf block and 15 table blocks — a total of 16 blocks, so a cost of 16.

Clearly, according to this viewpoint, the index on table T1 is much more desirable than the index on table T2. This leaves two questions outstanding, though:

Where does the tablescan cost come from, and why are the figures for the avg_data_blocks_per_key so different between the two tables?

The answer to the second question is simple. Look back at the definition of table T1 — it uses the trunc() function to generate the N1 values, dividing the "rownum – 1 "by 15 and truncating.

Trunc(675/15) = 45
Trunc(676/15) = 45

Trunc(689/15) = 45

All the rows with the value 45 do actually appear one after the other in a tight little clump (probably all fitting one data block) in the table.

Table T2 uses the mod() function to generate the N1 values, using modulus 200 on the rownum:

mod(45,200) = 45
mod(245,200) = 45

mod(2845,200) = 45

The rows with the value 45 appear every two hundredth position in the table (probably resulting in no more than one row in every relevant block).

By doing the analyze, Oracle was able to get a perfect description of the data scatter in our table. So the optimiser was able to work out exactly how many blocks Oracle would have to visit to answer our query — and, in simple cases, the number of block visits is the cost of the query.
But Why the Tablescan?

So we see that an indexed access into T2 is more expensive than the same path into T1, but why has Oracle switched to the tablescan?

This brings us to the two simple-minded, and rather inappropriate, assumptions that Oracle makes.

The first is that every block acquisition equates to a physical disk read, and the second is that a multiblock read is just as quick as a single block read.

So what impact do these assumptions have on our experiment?

If you query the user_tables view with the following SQL:

select
table_name,
blocks
from user_tables;

you will find that our two tables each cover 96 blocks.

At the start of the article, I pointed out that the test case was running a version 8 system with the value 8 for the db_file_multiblock_read_count.

Roughly speaking, Oracle has decided that it can read the entire 96 block table in 96/8 = 12 disk read requests.

Since it takes 16 block (= disk read) requests to access the table by index, it is clearer quicker (from Oracle's sadly deluded perspective) to scan the table — after all 12 is less than 16.

Voila! If the data you are targetting is suitably scattered across the table, you get tablescans even for a very small percentage of the data — a problem that can be exaggerated in the case of very big blocks and very small rows.
Correction

In fact, you will have noticed that my calculated number of scan reads was 12, whilst the cost reported in the execution plan was 15. It is a slight simplfication to say that the cost of a tablescan (or an index fast full scan for that matter) is

'number of blocks' /
db_file_multiblock_read_count.

Oracle uses an "adjusted" multi-block read value for the calculation (although it then tries to use the actual requested size when the scan starts to run).

For reference, the following table compares a few of the actual and adjusted values:
Actual Adjusted
4 4.175
8 6.589
16 10.398
32 16.409
64 25.895
128 40.865

As you can see, Oracle makes some attempt to protect you from the error of supplying an unfeasibly large value for this parameter.

There is a minor change in version 9, by the way, where the tablescan cost is further adjusted by adding one to result of the division — which means tablescans in V9 are generally just a little more expensive than in V8, so indexes are just a little more likely to be used.
Adjustments

We have seen that there are two assumptions built into the optimizer that are not very sensible.

* A single block read costs just as much as a multi-block read — (not really likely, particularly when running on file systems without direction)
* A block access will be a physical disk read — (so what is the buffer cache for?)

Since the early days of Oracle 8.1, there have been a couple of parameters that allow us to correct these assumption in a reasonably truthful way.

See Tim Gorman's article for a proper description of these parameters, but briefly:

Optimizer_index_cost_adj takes a value between 1 and 10000 with a default of 100. Effectively, this parameter describes how cheap a single block read is compared to a multiblock read. For example the value 30 (which is often a suitable first guess for an OLTP system) would tell Oracle that a single block read costs 30% of a multiblock read. Oracle would therefore incline towards using indexed access paths for low values of this parameter.

Optimizer_index_caching takes a value between 0 and 100 with a default of 0. This tells Oracle to assume that that percentage of index blocks will be found in the buffer cache. In this case, setting values close to 100 encourages the use of indexes over tablescans.

The really nice thing about both these parameters is that they can be set to "truthful" values.

Set the optimizer_index_caching to something in the region of the "buffer cache hit ratio." (You have to make your own choice about whether this should be the figure derived from the default pool, keep pool, or both).

The optimizer_index_cost_adj is a little more complicated. Check the typical wait times in v$system_event for the events "db file scattered read" (multi block reads) and "db file sequential reads" (single block reads). Divide the latter by the former and multiply by one hundred.
Improvements

Don't forget that the two parameters may need to be adjusted at different times of the day and week to reflect the end-user workload. You can't just derive one pair of figures, and use them for ever.

Happily, in Oracle 9, things have improved. You can now collect system statistics, which are originally included just the four:

+ Average single block read time
+ Average multi block read time
+ Average actual multiblock read
+ Notional usable CPU speed.

Suffice it to say that this feature is worth an article in its own right — but do note that the first three allow Oracle to discover the truth about the cost of multi block reads. And in fact, the CPU speed allows Oracle to work out the CPU cost of unsuitable access mechanisms like reading every single row in a block to find a specific data value and behave accordingly.

When you migrate to version 9, one of the first things you should investigate is the correct use of system statistics. This one feature alone may reduce the amount of time you spend trying to "tune" awkward SQL.

In passing, despite the wonderful effect of system statistics both of the optimizer adjusting parameters still apply — although the exact formula for their use seems to have changed between version 8 and version 9.
Variations on a Theme

Of course, I have picked one very special case — equality on a single column non-unique index, where thare are no nulls in the table — and treated it very simply. (I haven't even mentioned the relevance of the index blevel and clustering_factor yet.) There are numerous different strategies that Oracle uses to work out more general cases.

Consider some of the cases I have conveniently overlooked:

+ Multi-column indexes
+ Part-used multi-column indexes
+ Range scans
+ Unique indexes
+ Non-unique indexes representing unique constraints
+ Index skip scans
+ Index only queries
+ Bitmap indexes
+ Effects of nulls

The list goes on and on. There is no one simple formula that tells you how Oracle works out a cost — there is only a general guideline that gives you the flavour of the approach and a list of different formulae that apply in different cases.

However, the purpose of this article was to make you aware of the general approach and the two assumptions built into the optimiser's strategy. And I hope that this may be enough to take you a long way down the path of understanding the (apparently) strange things that the optimiser has been known to do.

Advertisements

March 8, 2010

Index Full Scan vs Index Fast Full Scan

Filed under: [System Performance tuning] — Tags: , , , — zhefeng @ 2:06 pm

http://spaces.msn.com/members/wzwanghai/

[Oracle] Index Full Scan vs Index Fast Full Scan
作者:汪海 (Wanghai)
日期:14-Aug-2005 
出处:http://spaces.msn.com/members/wzwanghai/

Index Full Scan vs Index Fast Full Scan

index full scan和index fast full scan是指同样的东西吗?答案是no。两者虽然从字面上看起来差不多,但是实现的机制完全不同。我们一起来看看两者的区别在哪里?

首先来看一下IFS,FFS能用在哪里:在一句sql中,如果我们想搜索的列都包含在索引里面的话,那么index full scan 和 index fast full scan 都可以被采用代替full table scan。比如以下语句:

SQL> CREATE TABLE TEST AS SELECT * FROM dba_objects WHERE 0=1;

SQL> CREATE INDEX ind_test_id ON TEST(object_id);

SQL> INSERT INTO TEST
SELECT *
FROM dba_objects
WHERE object_id IS NOT NULL AND object_id > 10000
ORDER BY object_id DESC;

17837 rows created.

SQL> analyze table test compute statistics for table for all columns for all indexes;

Table analyzed.

SQL> set autotrace trace;

SQL> select object_id from test;

17837 rows selected.

Execution Plan
———————————————————-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=68 Card=17837 Bytes=71348)
1 0 TABLE ACCESS (FULL) OF ‘TEST’ (Cost=68 Card=17837 Bytes=71348)

这时候 Oracle会选择全表扫描,因为 object_id 列默认是可以为null的,来修改成 not null:

SQL>alter table test modify(object_id not null);

SQL> select object_id from test;

17837 rows selected.

Execution Plan
———————————————————-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=11 Card=17837 Bytes=71348)
1 0 INDEX (FAST FULL SCAN) OF ‘IND_TEST_ID’ (NON-UNIQUE) (Cost=11 Card=17837 Bytes=71348)

当然我们也可以使用index full scan:

SQL> select/*+ index(test ind_TEST_ID)*/ object_id from test;

17837 rows selected.

Execution Plan
———————————————————-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=41 Card=17837 Bytes=71348)
1 0 INDEX (FULL SCAN) OF ‘IND_TEST_ID’ (NON-UNIQUE) (Cost=101 Card=17837 Bytes=71348)

我们看到了两者都可以在这种情况下使用,那么他们有什么区别呢?有个地方可以看出两者的区别, 来看一下两者的输出结果,为了让大家看清楚一点,我们只取10行。

INDEX FAST FULL SCAN

SQL> select object_id from test where rownum select/*+ index(test ind_TEST_ID)*/ object_id from test where rownum select object_id from dba_objects where object_name=’IND_TEST_ID’;

OBJECT_ID
———-
70591

索引的object_id为70591,使用tree dump可以看到索引树的结构

SQL> ALTER SESSION SET EVENTS ‘immediate trace name TREEDUMP level 70591’;

—– begin tree dump
branch: 0x6809b8d 109091725 (0: nrow: 100, level: 1)
leaf: 0x6809b96 109091734 (-1: nrow: 294 rrow: 0)
leaf: 0x6c07ec1 113278657 (0: nrow: 262 rrow: 0)
leaf: 0x6c07ebd 113278653 (1: nrow: 518 rrow: 0)
leaf: 0x6c07eb1 113278641 (2: nrow: 524 rrow: 0)
leaf: 0x6c07ead 113278637 (3: nrow: 524 rrow: 0)
leaf: 0x6c07ea9 113278633 (4: nrow: 524 rrow: 0)
leaf: 0x6c07ea5 113278629 (5: nrow: 524 rrow: 0)
leaf: 0x6c07ea1 113278625 (6: nrow: 524 rrow: 0)
leaf: 0x6c07e9d 113278621 (7: nrow: 524 rrow: 0)
leaf: 0x6c07e99 113278617 (8: nrow: 524 rrow: 0)
leaf: 0x6c07e95 113278613 (9: nrow: 532 rrow: 0)
leaf: 0x6c07e91 113278609 (10: nrow: 524 rrow: 0)
leaf: 0x6c07e8d 113278605 (11: nrow: 524 rrow: 0)
leaf: 0x6c07ec8 113278664 (12: nrow: 524 rrow: 0)
leaf: 0x6c07ec4 113278660 (13: nrow: 524 rrow: 0)
leaf: 0x6c07ec0 113278656 (14: nrow: 524 rrow: 0)
leaf: 0x6c07ebc 113278652 (15: nrow: 524 rrow: 0)
leaf: 0x6809bb2 109091762 (16: nrow: 524 rrow: 0)
leaf: 0x6c07eb8 113278648 (17: nrow: 524 rrow: 0)
leaf: 0x6c07eb4 113278644 (18: nrow: 524 rrow: 0)
leaf: 0x6c07eb0 113278640 (19: nrow: 524 rrow: 0)
leaf: 0x6c07eac 113278636 (20: nrow: 524 rrow: 0)
leaf: 0x6809bae 109091758 (21: nrow: 524 rrow: 0)
leaf: 0x6c07ea8 113278632 (22: nrow: 524 rrow: 0)
leaf: 0x6c07ea4 113278628 (23: nrow: 524 rrow: 0)
leaf: 0x6c07ea0 113278624 (24: nrow: 105 rrow: 105)
leaf: 0x6c07e9c 113278620 (25: nrow: 129 rrow: 129)
leaf: 0x6c07eb9 113278649 (26: nrow: 123 rrow: 123)
leaf: 0x6809baa 109091754 (27: nrow: 246 rrow: 246)
leaf: 0x6c07e98 113278616 (28: nrow: 246 rrow: 246)
leaf: 0x6c07e94 113278612 (29: nrow: 246 rrow: 246)
leaf: 0x6809ba6 109091750 (30: nrow: 246 rrow: 246)
leaf: 0x6809bce 109091790 (31: nrow: 246 rrow: 246)
leaf: 0x6809bca 109091786 (32: nrow: 246 rrow: 246)
leaf: 0x6809c05 109091845 (33: nrow: 248 rrow: 248)
leaf: 0x6809c01 109091841 (34: nrow: 246 rrow: 246)
leaf: 0x6809bfd 109091837 (35: nrow: 246 rrow: 246)
leaf: 0x6809bf9 109091833 (36: nrow: 246 rrow: 246)
leaf: 0x6809bf5 109091829 (37: nrow: 246 rrow: 246)
leaf: 0x6809bf1 109091825 (38: nrow: 246 rrow: 246)
leaf: 0x6809bed 109091821 (39: nrow: 246 rrow: 246)
leaf: 0x6809be9 109091817 (40: nrow: 246 rrow: 246)
leaf: 0x6809be5 109091813 (41: nrow: 246 rrow: 246)
leaf: 0x6809be1 109091809 (42: nrow: 246 rrow: 246)
leaf: 0x6809bdd 109091805 (43: nrow: 246 rrow: 246)
leaf: 0x6809bd9 109091801 (44: nrow: 246 rrow: 246)
leaf: 0x6809bd5 109091797 (45: nrow: 246 rrow: 246)
leaf: 0x6809bd1 109091793 (46: nrow: 248 rrow: 248)
leaf: 0x6809bcd 109091789 (47: nrow: 246 rrow: 246)
leaf: 0x6809bc9 109091785 (48: nrow: 246 rrow: 246)
leaf: 0x6809c08 109091848 (49: nrow: 246 rrow: 246)
leaf: 0x6809c04 109091844 (50: nrow: 246 rrow: 246)
leaf: 0x6809c00 109091840 (51: nrow: 246 rrow: 246)
leaf: 0x6809bfc 109091836 (52: nrow: 246 rrow: 246)
leaf: 0x6809bf8 109091832 (53: nrow: 246 rrow: 246)
leaf: 0x6809bf4 109091828 (54: nrow: 246 rrow: 246)
leaf: 0x6809bf0 109091824 (55: nrow: 246 rrow: 246)
leaf: 0x6809bec 109091820 (56: nrow: 246 rrow: 246)
leaf: 0x6809be8 109091816 (57: nrow: 246 rrow: 246)
leaf: 0x6809be4 109091812 (58: nrow: 246 rrow: 246)
leaf: 0x6809be0 109091808 (59: nrow: 248 rrow: 248)
leaf: 0x6809bdc 109091804 (60: nrow: 246 rrow: 246)
leaf: 0x6809bd8 109091800 (61: nrow: 246 rrow: 246)
leaf: 0x6809bd4 109091796 (62: nrow: 246 rrow: 246)
leaf: 0x6809bd0 109091792 (63: nrow: 246 rrow: 246)
leaf: 0x6809bcc 109091788 (64: nrow: 246 rrow: 246)
leaf: 0x6809c07 109091847 (65: nrow: 246 rrow: 246)
leaf: 0x6809c03 109091843 (66: nrow: 246 rrow: 246)
leaf: 0x6809bff 109091839 (67: nrow: 246 rrow: 246)
leaf: 0x6809bfb 109091835 (68: nrow: 246 rrow: 246)
leaf: 0x6809bf7 109091831 (69: nrow: 246 rrow: 246)
leaf: 0x6809bf3 109091827 (70: nrow: 246 rrow: 246)
leaf: 0x6809bef 109091823 (71: nrow: 246 rrow: 246)
leaf: 0x6809beb 109091819 (72: nrow: 248 rrow: 248)
leaf: 0x6809be7 109091815 (73: nrow: 246 rrow: 246)
leaf: 0x6809be3 109091811 (74: nrow: 246 rrow: 246)
leaf: 0x6809bdf 109091807 (75: nrow: 246 rrow: 246)
leaf: 0x6809bdb 109091803 (76: nrow: 246 rrow: 246)
leaf: 0x6809bd7 109091799 (77: nrow: 246 rrow: 246)
leaf: 0x6809bd3 109091795 (78: nrow: 246 rrow: 246)
leaf: 0x6809bcf 109091791 (79: nrow: 246 rrow: 246)
leaf: 0x6809bcb 109091787 (80: nrow: 246 rrow: 246)
leaf: 0x6809c06 109091846 (81: nrow: 246 rrow: 246)
leaf: 0x6809c02 109091842 (82: nrow: 246 rrow: 246)
leaf: 0x6809bfe 109091838 (83: nrow: 246 rrow: 246)
leaf: 0x6809bfa 109091834 (84: nrow: 246 rrow: 246)
leaf: 0x6809ba2 109091746 (85: nrow: 129 rrow: 129)
leaf: 0x6c07eb5 113278645 (86: nrow: 123 rrow: 123)
leaf: 0x6809bf6 109091830 (87: nrow: 246 rrow: 246)
leaf: 0x6809bf2 109091826 (88: nrow: 246 rrow: 246)
leaf: 0x6809bee 109091822 (89: nrow: 246 rrow: 246)
leaf: 0x6809bea 109091818 (90: nrow: 246 rrow: 246)
leaf: 0x6809b9e 109091742 (91: nrow: 246 rrow: 246)
leaf: 0x6809be6 109091814 (92: nrow: 246 rrow: 246)
leaf: 0x6809be2 109091810 (93: nrow: 246 rrow: 246)
leaf: 0x6809bde 109091806 (94: nrow: 246 rrow: 246)
leaf: 0x6809bda 109091802 (95: nrow: 246 rrow: 246)
leaf: 0x6809b9a 109091738 (96: nrow: 246 rrow: 246)
leaf: 0x6809bd6 109091798 (97: nrow: 246 rrow: 246)
leaf: 0x6809bd2 109091794 (98: nrow: 246 rrow: 246)
—– end tree dump

index full scan读取的是0x6c07ea0 这个块,而index fast full scan读取的是 0x6809b9a这个块也就是包含数据的物理存储位置最前的块。分别看一下这两个块的内容
0x6c07ea0 =十进制的113278624
0x6809b9a =十进制的109091738

SQL> select dbms_utility.data_block_address_file(113278624) “file”,dbms_utility.data_block_address_block(113278624) “block” from dual;

file block
———- ———-
27 32416

SQL> select dbms_utility.data_block_address_file(109091738) “file”,dbms_utility.data_block_address_block(109091738)”block” from dual;

file block
———- ———-
26 39834

SQL> alter system dump datafile 27 block 32416;

SQL> alter system dump datafile 26 block 39834;

block 32416的前10行

row#0[6564] flag: —–, lock: 2
col 0; len 4; (4): c3 02 07 11
col 1; len 6; (6): 07 00 7c 20 00 2b
row#1[6578] flag: —–, lock: 2
col 0; len 4; (4): c3 02 16 4e
col 1; len 6; (6): 07 00 7c 20 00 2a
row#2[6592] flag: —–, lock: 2
col 0; len 4; (4): c3 02 16 4f
col 1; len 6; (6): 07 00 7c 20 00 29
row#3[6606] flag: —–, lock: 2
col 0; len 4; (4): c3 02 16 50
col 1; len 6; (6): 07 00 7c 20 00 28
row#4[6620] flag: —–, lock: 2
col 0; len 4; (4): c3 02 18 02
col 1; len 6; (6): 07 00 7c 20 00 27
row#5[6634] flag: —–, lock: 2
col 0; len 4; (4): c3 02 23 60
col 1; len 6; (6): 07 00 7c 20 00 26
row#6[6648] flag: —–, lock: 2
col 0; len 4; (4): c3 02 24 25
col 1; len 6; (6): 07 00 7c 20 00 25
row#7[6662] flag: —–, lock: 2
col 0; len 4; (4): c3 02 24 28
col 1; len 6; (6): 07 00 7c 20 00 24
row#8[6676] flag: —–, lock: 2
col 0; len 4; (4): c3 02 28 18
col 1; len 6; (6): 07 00 7c 20 00 23
row#9[6690] flag: —–, lock: 2
col 0; len 4; (4): c3 02 42 04
col 1; len 6; (6): 07 00 7c 20 00 22

block 39834的前10行
row#0[4591] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 43
col 1; len 6; (6): 02 81 71 f6 00 36
row#1[4605] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 44
col 1; len 6; (6): 02 81 71 f6 00 35
row#2[4619] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 45
col 1; len 6; (6): 02 81 71 f6 00 34
row#3[4633] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 46
col 1; len 6; (6): 02 81 71 f6 00 33
row#4[4647] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 47
col 1; len 6; (6): 02 81 71 f6 00 32
row#5[4661] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 48
col 1; len 6; (6): 02 81 71 f6 00 31
row#6[4675] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 49
col 1; len 6; (6): 02 81 71 f6 00 30
row#7[4689] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 4a
col 1; len 6; (6): 02 81 71 f6 00 2f
row#8[4703] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 4b
col 1; len 6; (6): 02 81 71 f6 00 2e
row#9[4717] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 4c
col 1; len 6; (6): 02 81 71 f6 00 2d

对照一下前面的结果集
block 32416的第一行为10616,数据内的存储格式应该为

SQL> select dump(10616,16) from dual;

DUMP(10616,16)
———————-
Typ=2 Len=4: c3,2,7,11

确实等于dump block所看到的

row#0[6564] flag: —–, lock: 2
col 0; len 4; (4): c3 02 07 11
col 1; len 6; (6): 07 00 7c 20 00 2b

再看block 39834的第1行

SQL> select dump(66266,16) from dual;

DUMP(66266,16)
———————–
Typ=2 Len=4: c3,7,3f,43

跟dump 的结果也一样

row#0[4591] flag: —–, lock: 2
col 0; len 4; (4): c3 07 3f 43
col 1; len 6; (6): 02 81 71 f6 00 36

这就证明了上面所说的index full scan和index fast full scan的不同。
我们也可以用10046事件去跟踪两者走的路径。

SQL> ALTER SESSION SET EVENTS ‘immediate trace name flush_cache’;

(清空buffer cache,以便观看’db file sequential read’,’db file scattered read’事件)。

SQL> alter session set events’10046 trace name context forever,level 12′;

Session altered.

SQL> select object_id from test where rownum alter session set events’10046 trace name context off’;

Session altered.

[oracle@csdbc udump]$ grep read cs-dbc_ora_15596.trc

Redo thread mounted by this instance: 1
WAIT #1: nam=’db file sequential read’ ela= 33 p1=26 p2=39820 p3=1
WAIT #1: nam=’db file sequential read’ ela= 21 p1=26 p2=39817 p3=1
WAIT #1: nam=’db file sequential read’ ela= 17 p1=26 p2=39819 p3=1
WAIT #1: nam=’db file parallel read’ ela= 53 p1=2 p2=2 p3=2
WAIT #1: nam=’db file scattered read’ ela= 466 p1=26 p2=39821 p3=16

最前面的’db file sequential read’是由于读段头等操作,我们来关注’db file scattered read’事件,因为index fast full scan是采用多块读,从39821开始读取db_file_multiblock_read_count个块(本例里设置为16)。我们关心的 39834块正位于其中。
再来看index full scan的10046 trace

SQL> ALTER SESSION SET EVENTS ‘immediate trace name flush_cache’;

(清空buffer cache,以便观看’db file sequential read’,’db file scattered read’事件)。

SQL> alter session set events’10046 trace name context forever,level 12′;

Session altered.

SQL>

OBJECT_ID
———-
10616
12177
12178
12179
12301
13495
13536
13539
13923
16503

10 rows selected.

SQL> alter session set events’10046 trace name context off’;

Session altered.

[oracle@csdbc udump]$ grep read cs-dbc_ora_15609.trc

Redo thread mounted by this instance: 1
WAIT #1: nam=’db file sequential read’ ela= 49 p1=26 p2=39821 p3=1
root block,正是先前索引树dump里面的 0x6809b8d
WAIT #1: nam=’db file sequential read’ ela= 32 p1=26 p2=39830 p3=1
WAIT #1: nam=’db file sequential read’ ela= 40 p1=27 p2=32449 p3=1
WAIT #1: nam=’db file sequential read’ ela= 35 p1=27 p2=32445 p3=1
WAIT #1: nam=’db file sequential read’ ela= 28 p1=27 p2=32433 p3=1
WAIT #1: nam=’db file sequential read’ ela= 19 p1=27 p2=32429 p3=1
WAIT #1: nam=’db file sequential read’ ela= 34 p1=27 p2=32425 p3=1
WAIT #1: nam=’db file sequential read’ ela= 32 p1=27 p2=32421 p3=1
WAIT #1: nam=’db file sequential read’ ela= 33 p1=27 p2=32417 p3=1
WAIT #1: nam=’db file sequential read’ ela= 29 p1=27 p2=32413 p3=1
WAIT #1: nam=’db file sequential read’ ela= 37 p1=27 p2=32409 p3=1
WAIT #1: nam=’db file sequential read’ ela= 32 p1=27 p2=32405 p3=1
WAIT #1: nam=’db file sequential read’ ela= 35 p1=27 p2=32401 p3=1
WAIT #1: nam=’db file sequential read’ ela= 34 p1=27 p2=32397 p3=1
WAIT #1: nam=’db file sequential read’ ela= 31 p1=27 p2=32456 p3=1
WAIT #1: nam=’db file sequential read’ ela= 29 p1=27 p2=32452 p3=1
WAIT #1: nam=’db file sequential read’ ela= 31 p1=27 p2=32448 p3=1
WAIT #1: nam=’db file sequential read’ ela= 30 p1=27 p2=32444 p3=1
WAIT #1: nam=’db file sequential read’ ela= 38 p1=26 p2=39858 p3=1
WAIT #1: nam=’db file sequential read’ ela= 31 p1=27 p2=32440 p3=1
WAIT #1: nam=’db file sequential read’ ela= 32 p1=27 p2=32436 p3=1
WAIT #1: nam=’db file sequential read’ ela= 35 p1=27 p2=32432 p3=1
WAIT #1: nam=’db file sequential read’ ela= 31 p1=27 p2=32428 p3=1
WAIT #1: nam=’db file sequential read’ ela= 29 p1=26 p2=39854 p3=1
WAIT #1: nam=’db file sequential read’ ela= 36 p1=27 p2=32424 p3=1
WAIT #1: nam=’db file sequential read’ ela= 32 p1=27 p2=32420 p3=1
WAIT #1: nam=’db file sequential read’ ela= 36 p1=27 p2=32416 p3=1

index full scan走的路径正是文章开始所提到的定位到root block,然后根据leaf block链表一路读取块。看到这里大家应该比较了解index full scan 和index fast full scan的区别了,最后补充一下 index full scan 和 index fast full scan 在排序上的不同。

SQL> set autotrace trace;

SQL> select object_id from test order by object_id;

17837 rows selected.

Execution Plan
———————————————————-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=41 Card=17837 Bytes=71348)
1 0 INDEX (FULL SCAN) OF ‘IND_TEST_ID’ (NON-UNIQUE) (Cost=101 Card=17837 Bytes=71348)

由于有排序所以oracle自动选择了index full scan避免了排序。那么强制用index fast full scan呢?

SQL> select/*+ index_ffs(test ind_test_id)*/object_id from test order by object_id;
17837 rows selected.

Execution Plan
———————————————————-
0 SELECT STATEMENT Optimizer=CHOOSE (Cost=59 Card=17837 Bytes=71348)
1 0 SORT (ORDER BY) (Cost=59 Card=17837 Bytes=71348)
2 1 INDEX (FAST FULL SCAN) OF ‘IND_TEST_ID’ (NON-UNIQUE) (Cost=11 Card=17837 Bytes=71348)

index fast full scan会多一步sort order by,相信仔细看过这篇文章的人能知道其中结果了吧,还不知道的人请在文章中自己找答案吧。

Blog at WordPress.com.