How to estimate the cost if partition table missing partition level statistics ?(缺失分区表级统计信息如何计算多分区扫描的COST)
月初3号时一套数据库SQL突然变慢,查看SQL文本就是一个普通的select语句,查询的是以月份命名的再以天划分新建的分区表,SQL语句中有一个选择率很好的手机号的字段,语句是查询该手机最近两天的记录,但是查看当时的执行计划使用的是PARTITION RANGE ITERATOR FTS 分区全表扫, 该手机号字段上有索引, 只有2天的数据但是记录已经几千万条记录, 为什么不使用索引呢?
如select xx,xx from tab201508 where time between date’20150801′ and date’20150803′ and tel=’17000000000′;
首先查看表的统计信息,分区表的统计信息分global stats(DBA_TABLE)和partition stats(DBA_TAB_PARTITION),发现该分区表是global stats 有,partition stats全为空, 查看该SQL发现有两个子游标有不同的执行计划,#1使用的是了正确的索引,#2使用就是当前的多分区全表扫描, 查看#2第一次解析到shared_pool时间是9:18 分, 为什么执行计划会改变呢?
再次查看表的global stats 的last_analyzed 刚好是9:10分, 收集统计信息有可能导致执行计划改变, 查看收集统计信息的scheduler GATHER_STATS_JOB 果然是在running, 工作日收集统计信息的时间window是22:00 开始持续4个小时 , 周末6:00开始持续20个小时,刚好当天是周末,查看了GATHER_STATS_JOB的运行日志,前两天都因为时间窗口关闭而终止, 让我们大胆的猜测一下会不会是这样:该SQL 在9:10之前第一次解析时因为表的统计信息都不存在,使用的动态采样生成了使用索引的正确的执行计划, 后来因为收集统计信息的job 在收集到该表时,先是产生了表的global stats, 但是partition stats没有收集前,利用当前分区表上的global stats解析生成了#2 sql游标,使用了错误的执行计划,收集统计的job GRANULARITY默认是AUTO, 理论上是会收集partition 级的统计信息的,也许是还未来得及收集 。
下面是GRANULARITY不同值时统计信息收集的级别
Granularity Global Statistics Level Global Table Partition Partition SubPartition SubPartition Default Table + Partition All Table + Partition + Subpartition
如果global and partition 都无,那就动态采样(default如果没禁用Dynamic Sample),如果global and partition 都有没有疑问,如果有partition 无global 那多分区扫时可以聚合,但是如果有global 无partition 时,单分区的CBO代价是如何算?多分区扫又如何记录呢?带个这个问题我来做个测试
create table anbob.t(id int,ctime date,name varchar2(100)) partition by range(ctime) ( partition p1 values less than (TO_DATE('2015-1-1','yyyy-mm-dd')), partition p2 values less than (TO_DATE('2015-2-1','yyyy-mm-dd')), partition p3 values less than (TO_DATE('2015-3-1','yyyy-mm-dd')), partition p4 values less than (TO_DATE('2015-4-1','yyyy-mm-dd')), partition p5 values less than (TO_DATE('2015-5-1','yyyy-mm-dd')), partition p6 values less than (TO_DATE('2015-6-1','yyyy-mm-dd')), partition p7 values less than (TO_DATE('2015-7-1','yyyy-mm-dd')), partition p8 values less than (TO_DATE('2015-8-1','yyyy-mm-dd')), partition p9 values less than (TO_DATE('2015-9-1','yyyy-mm-dd')), partition p10 values less than (TO_DATE('2015-10-1','yyyy-mm-dd')), partition p11 values less than (TO_DATE('2015-11-1','yyyy-mm-dd')), partition p12 values less than (TO_DATE('2015-12-1','yyyy-mm-dd')), partition p13 values less than (TO_DATE('2016-1-1','yyyy-mm-dd')), partition p14 values less than (TO_DATE('2016-2-1','yyyy-mm-dd')), partition p15 values less than (TO_DATE('2016-3-1','yyyy-mm-dd')), partition p16 values less than (TO_DATE('2016-4-1','yyyy-mm-dd')), partition p17 values less than (TO_DATE('2016-5-1','yyyy-mm-dd')), partition p18 values less than (TO_DATE('2016-6-1','yyyy-mm-dd')), partition p19 values less than (TO_DATE('2016-7-1','yyyy-mm-dd')), partition p20 values less than (TO_DATE('2016-8-1','yyyy-mm-dd')) ); SQL> insert into anbob.t select dbms_random.random, sysdate, 'anbob'||rownum from dual connect by rownum<=1E6; 1000000 rows created. SQL> commit; Commit complete. SQL> select * from anbob.t where rownum<10; ID CTIME NAME ---------- ----------------- -------------- -396539064 20150818 11:24:31 anbob989 -108530760 20150818 11:24:31 anbob990 -944850610 20150818 11:24:31 anbob991 650366221 20150818 11:24:31 anbob992 487828661 20150818 11:24:31 anbob993 1673114746 20150818 11:24:31 anbob994 1278669788 20150818 11:24:31 anbob995 528234050 20150818 11:24:31 anbob996 2013900942 20150818 11:24:31 anbob997 9 rows selected. SQL> @seg anbob.t SEG_MB OWNER SEGMENT_NAME SEG_PART_NAME SEGMENT_TYPE SEG_TABLESPACE_NAME BLOCKS HDRFIL HDRBLK ---------- ----------- ----------------- --------------- -------------------- --------------------- ---------- ---------- ---------- 36 ANBOB T P9 TABLE PARTITION USERS 4608 36 3437899 SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => 'ANBOB' , tabname => 'T',cascade => true, estimate_percent => dbms_stats.auto_sample_size,method_opt=>'for all columns size 1', granularity => 'GLOBAL', degree => 8,no_invalidate=>false); PL/SQL procedure successfully completed. SQL> @tab anbob.t Show tables matching condition "%anbob.t%" (if schema is not specified then current users tables only are shown)... OWNER TABLE_NAME TYPE NUM_ROWS BLOCKS EMPTY AVGSPC ROWLEN TAB_LAST_ANALYZED DEGREE COMPRESS -------------------- ------------------------------ ---- ------------ ------------- --------- ------ ------ ----------------- ---------- -------- ANBOB T PTAB 1000000 4528 0 0 27 20150818 11:29:13 1 SQL> @tabpart anbob.t TABLE_OWNER TABLE_NAME POS COM PARTITION_NAME NUM_ROWS SUBPARTITION_COUNT HIGH_VALUE_RAW HIGH_VALUE_LENGTH COMPRESS COMPRESS_FOR ---------------- ----------------- ---------- --- ------------------ ---------- ------------------ -------------------------------------- ----------------- -------- ------------ ANBOB T 1 NO P1 0 TO_DATE(' 2015-01-01 00:00:00', ) 83 DISABLED ANBOB T 2 NO P2 0 TO_DATE(' 2015-02-01 00:00:00', ) 83 DISABLED ANBOB T 3 NO P3 0 TO_DATE(' 2015-03-01 00:00:00', ) 83 DISABLED ANBOB T 4 NO P4 0 TO_DATE(' 2015-04-01 00:00:00', ) 83 DISABLED ANBOB T 5 NO P5 0 TO_DATE(' 2015-05-01 00:00:00', ) 83 DISABLED ANBOB T 6 NO P6 0 TO_DATE(' 2015-06-01 00:00:00', ) 83 DISABLED ANBOB T 7 NO P7 0 TO_DATE(' 2015-07-01 00:00:00', ) 83 DISABLED ANBOB T 8 NO P8 0 TO_DATE(' 2015-08-01 00:00:00', ) 83 DISABLED ANBOB T 9 NO P9 0 TO_DATE(' 2015-09-01 00:00:00', ) 83 DISABLED ANBOB T 10 NO P10 0 TO_DATE(' 2015-10-01 00:00:00', ) 83 DISABLED ANBOB T 11 NO P11 0 TO_DATE(' 2015-11-01 00:00:00', ) 83 DISABLED ANBOB T 12 NO P12 0 TO_DATE(' 2015-12-01 00:00:00', ) 83 DISABLED ANBOB T 13 NO P13 0 TO_DATE(' 2016-01-01 00:00:00', ) 83 DISABLED ANBOB T 14 NO P14 0 TO_DATE(' 2016-02-01 00:00:00', ) 83 DISABLED ANBOB T 15 NO P15 0 TO_DATE(' 2016-03-01 00:00:00', ) 83 DISABLED ANBOB T 16 NO P16 0 TO_DATE(' 2016-04-01 00:00:00', ) 83 DISABLED ANBOB T 17 NO P17 0 TO_DATE(' 2016-05-01 00:00:00', ) 83 DISABLED ANBOB T 18 NO P18 0 TO_DATE(' 2016-06-01 00:00:00', ) 83 DISABLED ANBOB T 19 NO P19 0 TO_DATE(' 2016-07-01 00:00:00', ) 83 DISABLED ANBOB T 20 NO P20 0 TO_DATE(' 2016-08-01 00:00:00', ) 83 DISABLED 20 rows selected.
note:
现在我们创建了个分区表,有20个分区,insert 了1000000记录, 收集统计信息只收集了global级的。真正的数据都在p9分区,其它分区空。
SQL> select * from anbob.t ; 1000000 rows selected. Execution Plan ---------------------------------------------------------- Plan hash value: 1951471977 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1000K| 25M| 1245 (2)| 00:00:15 | | | | 1 | PARTITION RANGE ALL| | 1000K| 25M| 1245 (2)| 00:00:15 | 1 | 20 | | 2 | TABLE ACCESS FULL | T | 1000K| 25M| 1245 (2)| 00:00:15 | 1 | 20 | -------------------------------------------------------------------------------------------- SQL> select * from anbob.t partition(p18); no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 4030076868 ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 50000 | 1318K| 64 (2)| 00:00:01 | | | | 1 | PARTITION RANGE SINGLE| | 50000 | 1318K| 64 (2)| 00:00:01 | 18 | 18 | | 2 | TABLE ACCESS FULL | T | 50000 | 1318K| 64 (2)| 00:00:01 | 18 | 18 | ----------------------------------------------------------------------------------------------- exec DBMS_STATS.GATHER_TABLE_STATS (ownname => 'ANBOB' , tabname => 'T',cascade => true, estimate_percent => dbms_stats.auto_sample_size,method_opt=>'for all columns size 1', granularity => 'ALL', degree => 8,no_invalidate=>false); SQL> select * from anbob.t partition(p18); Execution Plan ---------------------------------------------------------- Plan hash value: 4030076868 ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 74 | 2 (0)| 00:00:01 | | | | 1 | PARTITION RANGE SINGLE| | 1 | 74 | 2 (0)| 00:00:01 | 18 | 18 | | 2 | TABLE ACCESS FULL | T | 1 | 74 | 2 (0)| 00:00:01 | 18 | 18 | -----------------------------------------------------------------------------------------------
note:
可以看那单分区扫描p18时,CBO 估算行数时使用的是global统计信息的num_rows/number of partitions ,那刚好就1000000/20=50000。cost 也几乎是全表招的cost / number of partitions. 如果收集了所有的统计信息时才是正确的cost和估算的有很大差距。
exec dbms_stats.delete_table_stats(ownname => 'ANBOB' , tabname => 'T'); SQL> select * from anbob.t ; Execution Plan ---------------------------------------------------------- Plan hash value: 1951471977 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 900K| 63M| 1243 (2)| 00:00:15 | | | | 1 | PARTITION RANGE ALL| | 900K| 63M| 1243 (2)| 00:00:15 | 1 | 20 | | 2 | TABLE ACCESS FULL | T | 900K| 63M| 1243 (2)| 00:00:15 | 1 | 20 | -------------------------------------------------------------------------------------------- Note ----- - dynamic sampling used for this statement (level=2) SQL> select * from anbob.t partition(p18); Execution Plan ---------------------------------------------------------- Plan hash value: 4030076868 ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 74 | 2 (0)| 00:00:01 | | | | 1 | PARTITION RANGE SINGLE| | 1 | 74 | 2 (0)| 00:00:01 | 18 | 18 | | 2 | TABLE ACCESS FULL | T | 1 | 74 | 2 (0)| 00:00:01 | 18 | 18 | ----------------------------------------------------------------------------------------------- Note ----- - dynamic sampling used for this statement (level=2)
Note:
如果统计信息global and partition level都没有,使用的动态采样也能正确的估算数据。
如果是只有global statistics,多分区范围扫描呢?
SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => 'ANBOB' , tabname => 'T',cascade => true, estimate_percent => dbms_stats.auto_sample_size,method_opt=>'for all columns size 1', granularity => 'GLOBAL', degree => 8,no_invalidate=>false); PL/SQL procedure successfully completed. SQL> select * from anbob.t where ctime between TO_DATe('2016-5-1','yyyy-mm-dd') and TO_DATe('2016-7-1','yyyy-mm-dd') ; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 3249057999 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 27 | 84 (2)| 00:00:02 | | | | 1 | PARTITION RANGE ITERATOR| | 1 | 27 | 84 (2)| 00:00:02 | 18 | 20 | |* 2 | TABLE ACCESS FULL | T | 1 | 27 | 84 (2)| 00:00:02 | 18 | 20 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("CTIME"<=TO_DATE(' 2016-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) SQL> select * from anbob.t where ctime between TO_DATe('2016-4-1','yyyy-mm-dd') and TO_DATe('2016-7-1','yyyy-mm-dd') ; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 3249057999 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 27 | 112 (2)| 00:00:02 | | | | 1 | PARTITION RANGE ITERATOR| | 1 | 27 | 112 (2)| 00:00:02 | 17 | 20 | |* 2 | TABLE ACCESS FULL | T | 1 | 27 | 112 (2)| 00:00:02 | 17 | 20 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("CTIME"<=TO_DATE(' 2016-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) SQL> select * from anbob.t where ctime between TO_DATe('2016-3-1','yyyy-mm-dd') and TO_DATe('2016-7-1','yyyy-mm-dd') ; Execution Plan ---------------------------------------------------------- Plan hash value: 3249057999 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 27 | 139 (2)| 00:00:02 | | | | 1 | PARTITION RANGE ITERATOR| | 1 | 27 | 139 (2)| 00:00:02 | 16 | 20 | |* 2 | TABLE ACCESS FULL | T | 1 | 27 | 139 (2)| 00:00:02 | 16 | 20 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("CTIME"<=TO_DATE(' 2016-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
Note:
虽然是估算的rows没太大错误,但是cost确发现了个规律每加一个分区约增加cost 28左右.
如果删除统计信息后
SQL> select /*53*/* from anbob.t where ctime between TO_DATe('2016-5-1','yyyy-mm-dd') and TO_DATe('2016-7-1','yyyy-mm-dd') ; Execution Plan ---------------------------------------------------------- Plan hash value: 3249057999 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 74 | 2 (0)| 00:00:01 | | | | 1 | PARTITION RANGE ITERATOR| | 1 | 74 | 2 (0)| 00:00:01 | 18 | 20 | |* 2 | TABLE ACCESS FULL | T | 1 | 74 | 2 (0)| 00:00:01 | 18 | 20 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("CTIME"<=TO_DATE(' 2016-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) Note ----- - dynamic sampling used for this statement (level=2
Note:
动态采样也可以正确的估算COST。
如多分区范围扫描的COST是怎么算的呢?会不会也是从global statistics 计算来的呢? 很好证明,我们把global statistics的rows 和blocks 增加10倍再查询同样的SQL看COST会不会也相应增加?
SQL> exec DBMS_STATS.GATHER_TABLE_STATS (ownname => 'ANBOB' , tabname => 'T',cascade => true, estimate_percent => dbms_stats.auto_sample_size,method_opt=>'for all columns size 1', granularity => 'GLOBAL', degree => 8,no_invalidate=>false); PL/SQL procedure successfully completed. exec dbms_stats.set_table_stats(ownname => 'ANBOB' , tabname => 'T',numrows=>10000000,numblks=>45280); SQL> select * from anbob.t ; Execution Plan ---------------------------------------------------------- Plan hash value: 1951471977 -------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | -------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10M| 257M| 12430 (2)| 00:02:30 | | | | 1 | PARTITION RANGE ALL| | 10M| 257M| 12430 (2)| 00:02:30 | 1 | 20 | | 2 | TABLE ACCESS FULL | T | 10M| 257M| 12430 (2)| 00:02:30 | 1 | 20 | -------------------------------------------------------------------------------------------- SQL> select * from anbob.t partition(p18); Execution Plan ---------------------------------------------------------- Plan hash value: 4030076868 ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 500K| 12M| 623 (2)| 00:00:08 | | | | 1 | PARTITION RANGE SINGLE| | 500K| 12M| 623 (2)| 00:00:08 | 18 | 18 | | 2 | TABLE ACCESS FULL | T | 500K| 12M| 623 (2)| 00:00:08 | 18 | 18 | ----------------------------------------------------------------------------------------------- SQL> select * from anbob.t where ctime between TO_DATe('2016-5-1','yyyy-mm-dd') and TO_DATe('2016-5-31','yyyy-mm-dd') ; no rows selected Execution Plan ---------------------------------------------------------- Plan hash value: 4030076868 ----------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ----------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 27 | 624 (2)| 00:00:08 | | | | 1 | PARTITION RANGE SINGLE| | 1 | 27 | 624 (2)| 00:00:08 | 18 | 18 | |* 2 | TABLE ACCESS FULL | T | 1 | 27 | 624 (2)| 00:00:08 | 18 | 18 | ----------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("CTIME"<=TO_DATE(' 2016-05-31 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
NOTE:
可以发现全分区扫和单分区扫的ROWS和COST都增加了10倍,下面看多分区范围扫描
SQL> select * from anbob.t where ctime between TO_DATe('2016-5-1','yyyy-mm-dd') and TO_DATe('2016-7-1','yyyy-mm-dd') ; Execution Plan ---------------------------------------------------------- Plan hash value: 3249057999 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 27 | 84 (2)| 00:00:02 | | | | 1 | PARTITION RANGE ITERATOR| | 1 | 27 | 84 (2)| 00:00:02 | 18 | 20 | |* 2 | TABLE ACCESS FULL | T | 1 | 27 | 84 (2)| 00:00:02 | 18 | 20 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("CTIME"<=TO_DATE(' 2016-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) SQL> select * from anbob.t where ctime between TO_DATe('2016-4-1','yyyy-mm-dd') and TO_DATe('2016-7-1','yyyy-mm-dd') ; Execution Plan ---------------------------------------------------------- Plan hash value: 3249057999 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 27 | 112 (2)| 00:00:02 | | | | 1 | PARTITION RANGE ITERATOR| | 1 | 27 | 112 (2)| 00:00:02 | 17 | 20 | |* 2 | TABLE ACCESS FULL | T | 1 | 27 | 112 (2)| 00:00:02 | 17 | 20 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("CTIME"<=TO_DATE(' 2016-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
Note:
可以看到修改global stats, 不会影响多分区扫的COST,还和上面得到的一致。得出结论这是计算的COST不是利用的GLOBAL statis.
alter table anbob.t add partition p21 values less than (TO_DATE('2016-9-1','yyyy-mm-dd')); alter table anbob.t add partition p22 values less than (TO_DATE('2016-10-1','yyyy-mm-dd')); alter table anbob.t add partition p23 values less than (TO_DATE('2016-11-1','yyyy-mm-dd')); alter table anbob.t add partition p24 values less than (TO_DATE('2016-12-1','yyyy-mm-dd')); alter table anbob.t add partition p25 values less than (TO_DATE('2017-1-1','yyyy-mm-dd')); alter table anbob.t add partition p26 values less than (TO_DATE('2017-2-1','yyyy-mm-dd')); alter table anbob.t add partition p27 values less than (TO_DATE('2017-3-1','yyyy-mm-dd')); alter table anbob.t add partition p28 values less than (TO_DATE('2017-4-1','yyyy-mm-dd')); alter table anbob.t add partition p29 values less than (TO_DATE('2017-5-1','yyyy-mm-dd')); alter table anbob.t add partition p30 values less than (TO_DATE('2017-6-1','yyyy-mm-dd')); SQL> select /*53*/* from anbob.t where ctime between TO_DATe('2016-5-1','yyyy-mm-dd') and TO_DATe('2016-7-1','yyyy-mm-dd') ; Execution Plan ---------------------------------------------------------- Plan hash value: 3249057999 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 27 | 84 (2)| 00:00:02 | | | | 1 | PARTITION RANGE ITERATOR| | 1 | 27 | 84 (2)| 00:00:02 | 18 | 20 | |* 2 | TABLE ACCESS FULL | T | 1 | 27 | 84 (2)| 00:00:02 | 18 | 20 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("CTIME"<=TO_DATE(' 2016-07-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
Note:
和分区数也没有直接关系,如果研究这个问题我们可以做10053 trace计划代价是怎么来的。
SQL> select owner,table_name,PARTITION_NAME,PARTITION_POSITION,NUM_ROWS,BLOCKS,AVG_ROW_LEN from dba_tab_statistics where table_name='T'; OWNER TABLE_NAME PARTITION_NAME PARTITION_POSITION NUM_ROWS BLOCKS AVG_ROW_LEN -------------- ------------------ -------------------- ------------------ ---------- ---------- ----------- ANBOB T 10000000 45280 27 ANBOB T P1 1 ANBOB T P2 2 ANBOB T P3 3 ANBOB T P4 4 ANBOB T P5 5 ANBOB T P6 6 ANBOB T P7 7 ANBOB T P8 8 ANBOB T P9 9 ANBOB T P10 10 ANBOB T P11 11 ANBOB T P12 12 ANBOB T P13 13 ANBOB T P14 14 ANBOB T P15 15 ANBOB T P16 16 ANBOB T P17 17 ANBOB T P18 18 ANBOB T P19 19 ANBOB T P20 20 SQL> alter session set events '10053 trace name context forever, level 2'; Session altered. SQL> explain plan for 2 select /*53*/* from anbob.t where ctime between TO_DATe('2016-5-1','yyyy-mm-dd') and TO_DATe('2016-7-1','yyyy-mm-dd') ; Explained. SQL> alter session set events '10053 trace name context off'; Session altered. SQL> oradebug setmypid Statement processed. SQL> oradebug tracefile_name /oracle/app/oracle/diag/rdbms/im/im1/trace/im1_ora_22398.trc
# 10053 trace file
CPUSPEED - CPU Speed IOTFRSPEED - I/O transfer speed IOSEEKTIM - I/O seek time SREADTIM - average single block read time MREADTIM - average multiblock read time MBRC - average multiblock read count db_file_multiblock_read_count = 128 _optimizer_block_size = 8192 ----------------------------- SYSTEM STATISTICS INFORMATION ----------------------------- Using NOWORKLOAD Stats CPUSPEEDNW: 1122 millions instructions/sec (default is 100) IOTFRSPEED: 4096 bytes per millisecond (default is 4096) IOSEEKTIM: 10 milliseconds (default is 10) MBRC: NO VALUE blocks (default is 8) *************************************** BASE STATISTICAL INFORMATION *********************** Table Stats:: Table: T Alias: T (Using composite stats) (making adjustments for partition skews) ORIGINAL VALUES:: #Rows: 10000000 #Blks: 45280 AvgRowLen: 27.00 ChainCnt: 0.00 PARTITIONS:: PRUNED: 3 ANALYZED: 0 UNANALYZED: 3 #Rows: 10000000 #Blks: 300 AvgRowLen: 27.00 ChainCnt: 0.00 Access path analysis for T *************************************** SINGLE TABLE ACCESS PATH Single Table Cardinality Estimation for T[T] Column (#2): CTIME( AvgLen: 8 NDV: 1 Nulls: 0 Density: 1.000000 Min: 2457253 Max: 2457253 Table: T Alias: T Card: Original: 10000000.000000 Rounded: 1 Computed: 1.00 Non Adjusted: 1.00 Access Path: TableScan Cost: 84.24 Resp: 84.24 Degree: 0 Cost_io: 83.00 Cost_cpu: 16712523 Resp_io: 83.00 Resp_cpu: 16712523 Best:: AccessPath: TableScan Cost: 84.24 Degree: 1 Resp: 84.24 Card: 1.00 Bytes: 0 *************************************** OPTIMIZER STATISTICS AND COMPUTATIONS *************************************** GENERAL PLANS *************************************** Considering cardinality-based initial join order. Permutations for Starting Table :0 Join order[1]: T[T]#0 *********************** Best so far: Table#: 0 cost: 84.2411 card: 1.0000 bytes: 27 *********************** (newjo-stop-1) k:0, spcnt:0, perm:1, maxperm:2000 SQL> select pname,pval1 from aux_stats$ where sname='SYSSTATS_MAIN'; PNAME PVAL1 ------------------------------ ---------- CPUSPEEDNW 1122.16321 IOSEEKTIM 10 IOTFRSPEED 4096 SREADTIM MREADTIM CPUSPEED 1177 MBRC MAXTHR SLAVETHR
SREADTIM=IOSEEKTIM+db_block_size/IOTFRSPEED
=10+8192/4096
=12
MREADTIM=IOSEEKTIM+MBRC*db_block_size/IOTFRSPEED
=10+8*8192/4096
=26
#MRDS=#blks/MBRC
=300/8
Cost=(I/O COST)+(CPU COST)
=(1+ceil(#MRDS*MREADTIM/SREADTIM))+(cost_cpu/CPUSPEED/1000/SREADTIM)
=(1+ceil(300/8*26/12))+(16712523/1177/1000/12)
=83+1
note:
通过计算我可以确认估算UNANALYZED分区的blocks 使用的是300,而不是45280, 300 是怎么来的呢? 从ORACLE的官方文档找到了答案,是一个默认值。
”
Handling Missing Statistics
When Oracle encounters a table with missing statistics, Oracle dynamically gathers the necessary statistics needed by the optimizer. However, for certain types of tables, Oracle does not perform dynamic sampling. These include remote tables and external tables. In those cases and also when dynamic sampling has been disabled, the optimizer uses default values for its statistics, shown in Table 15-3 and Table 15-4.
Table 15-3 Default Table Values When Statistics Are Missing
Table Statistic Default Value Used by Optimizer
Cardinality num_of_blocks * (block_size – cache_layer) / avg_row_len
Average row length 100 bytes
Number of blocks 100 or actual value based on the extent map <================
Remote cardinality 2000 rows
Remote average row length 100 bytes
http://docs.oracle.com/cd/B13789_01/server.101/b10752/stats.htm
”
我再加多查询一个分区试试
SQL> explain plan for 2 select /*53*/* from anbob.t where ctime between TO_DATe('2016-5-1','yyyy-mm-dd') and TO_DATe('2016-8-1','yyyy-mm-dd') ; Explained. *************************************** BASE STATISTICAL INFORMATION *********************** Table Stats:: Table: T Alias: T (Using composite stats) (making adjustments for partition skews) ORIGINAL VALUES:: #Rows: 10000000 #Blks: 45280 AvgRowLen: 27.00 ChainCnt: 0.00 PARTITIONS:: PRUNED: 4 ANALYZED: 0 UNANALYZED: 4 #Rows: 10000000 #Blks: 400 AvgRowLen: 27.00 ChainCnt: 0.00
NOTE:
#BLKS 的确是默认值100* PRUNED partitions数, 当然也不并是所有的都取100,就像文档里提的100 or actual value based on the extent map, 比如下面如果带上有数据分区,应该参考的就是extent map得来的值。
SQL> select /*53*/* from anbob.t where ctime between TO_DATe('2015-8-1','yyyy-mm-dd') and TO_DATe('2015-9-1','yyyy-mm-dd') ; Execution Plan ---------------------------------------------------------- Plan hash value: 3249057999 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10M| 257M| 1293 (3)| 00:00:16 | | | | 1 | PARTITION RANGE ITERATOR| | 10M| 257M| 1293 (3)| 00:00:16 | 9 | 10 | |* 2 | TABLE ACCESS FULL | T | 10M| 257M| 1293 (3)| 00:00:16 | 9 | 10 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("CTIME"<=TO_DATE(' 2015-09-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) SQL> select /*53*/* from anbob.t where ctime between TO_DATe('2015-8-1','yyyy-mm-dd') and TO_DATe('2015-10-1','yyyy-mm-dd') ; Execution Plan ---------------------------------------------------------- Plan hash value: 3249057999 ------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 10M| 257M| 1320 (3)| 00:00:16 | | | | 1 | PARTITION RANGE ITERATOR| | 10M| 257M| 1320 (3)| 00:00:16 | 9 | 11 | |* 2 | TABLE ACCESS FULL | T | 10M| 257M| 1320 (3)| 00:00:16 | 9 | 11 | ------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("CTIME"<=TO_DATE(' 2015-10-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
summary:
分区表的统计信息分为table level and parition level(先不说subpartition), 当CBO 在计算访问路径的COST时,如果table level and partition 都不存在统计信息,使用动态采样反而采集的数据没有问题,或都的确存在正确的统计信息时也都可以生成正确的执行计划,只有partition 级也不存在问题,但是如果只有table 级没有partition级时,多分区扫描时有可能会错误的利用 每个分区100 blocks生成较低的cost, 就像我们遇到的案例短时间内生成大量的数据到新分区,统计信息还没来的及收集, 而没有使用上正确的索引,后期我们是手动提前set 统计信息然后lock, 问题没有在发生。 具体案例具体分析,最好用数据说话。
— 致敬Thomas Kyte
对不起,这篇文章暂时关闭评论。