问题的起初是有个网友问我一个付费查询的sql如何再优,我在本机其它系统的环境找找了个相似结构的两个表,就是一对多。
icme_student(小表) ic_code唯一,对应icme_noproject_score(大表)的多条记录每个ic_code.
下面看我执行的执行计划,为了节约篇幅省略了一部分,并且都是第二次运行的结果
SQL> select count(*) total,count(distinct stu.ic_code) hav_sc_stucnt,sum(ns.class_score),stu.ic_code from icme_noproject_score ns
2 join icme_student stu on ns.ic_code=stu.ic_code
3 group by stu.ic_code
4 ;
854409 rows selected.
Elapsed: 00:01:38.28
Execution Plan
----------------------------------------------------------
Plan hash value: 1102099122
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 608K| 13M| | 121K (13)| 00:24:13 |
| 1 | SORT GROUP BY | | 608K| 13M| 648M| 121K (13)| 00:24:13 |
| 2 | NESTED LOOPS | | 20M| 470M| | 66742 (22)| 00:13:21 |
| 3 | INDEX FAST FULL SCAN| UK_NOPROJECT_SCORE | 20M| 274M| | 47986 (1)| 00:09:36 |
|* 4 | INDEX RANGE SCAN | PK_ICME_STUDENT | 1 | 10 | | 1 (0)| 00:00:01 |
-----------------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
41430942 consistent gets
默认CBO的执行计划运行了98秒,产生了41430942 consistent gets,比较怀疑的是为什么用要用大表做为驱动表?
说点题外话,在两个表的JOIN 中,CBO总是会计算三种JOIN的连接方式的代价,分别为NL JOIN,SM JOIN,HA JOIN,三种JOIN对前后两表的我称呼还不一样,分别是NL JOIN :outer table,inner table,SM :firest table,second table,HA:build table ,probe table,Nested jOIN中的outer table又叫driver table,是执行计划中上面的表,是循环中的外层表,如其它语言中的
FOR I IN 1..10 LOOP
FOR J IN 1...100000 LOOP
if j.xx=i.xx do something
end loop;
end loop;
i就是驱动表的rows,j就是inner table的记录,一般情况下是把小表做为驱动表。
我加上ordered hint改变驱动表
select /*+ ordered */ count(*) total,count(distinct stu.ic_code) hav_sc_stucnt,sum(ns.class_score),stu.ic_code from icme_student stu
join icme_noproject_score ns on stu.ic_code=ns.ic_code
group by stu.ic_code
854409 rows selected.
Elapsed: 00:00:34.78
Execution Plan
----------------------------------------------------------
Plan hash value: 2339117530
-----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time |
-----------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 608K| 13M| | 129K (2)| 00:26:00 |
| 1 | SORT GROUP BY | | 608K| 13M| 648M| 129K (2)| 00:26:00 |
|* 2 | HASH JOIN | | 20M| 470M| 24M| 75642 (1)| 00:15:08 |
| 3 | INDEX FAST FULL SCAN| PK_ICME_STUDENT | 1179K| 11M| | 956 (1)| 00:00:12 |
| 4 | INDEX FAST FULL SCAN| UK_NOPROJECT_SCORE | 20M| 274M| | 47986 (1)| 00:09:36 |
-----------------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
222813 consistent gets
运行时间变成了34秒,consistent gets也相对少了很多,而且变成了HA JOIN,小表主键也变成了IFFS,如果加上并行呢?
再说点题外话,如果是DBA用或就是那么一小摄儿人查询几次,在服务器有多核CPU,且IO资源有空闲的情况下可以开并行减少查询时间,但是对于app,在oltp系统中并发用户又很多,进行的都是非常短的事务,这时并发可能会给服务器增加压力和一些稀缺资源的竞争,并发受parallel_threads_per_cpu 等多个系统参数的影响,因为一般用户如果查询服务器cpu的核数呢?
SQL> DECLARE
2 parnam VARCHAR2(256);
3 intval BINARY_INTEGER;
4 strval VARCHAR2(256);
5 partyp BINARY_INTEGER;
6 BEGIN
7 partyp := dbms_utility.get_parameter_value('cpu_count',
8 intval, strval);
9 dbms_output.put('parameter value is: ');
10 IF partyp = 1 THEN
11 dbms_output.put_line(strval);
12 ELSE
13 dbms_output.put_line(intval);
14 END IF;
15
16 END;
17 /
parameter value is: 8
alter session force parallel query;
SQL> select count(*) total,count(distinct stu.ic_code) hav_sc_stucnt,sum(ns.class_score),stu.ic_code from icme_noproject_score ns
2 join icme_student stu on ns.ic_code=stu.ic_code
3 group by stu.ic_code
4 ;
854409 rows selected.
Elapsed: 00:00:20.62
Execution Plan
----------------------------------------------------------
Plan hash value: 4250446769
-------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 619K| 14M| | 3227 (3)| 00:00:39 | | | |
| 1 | PX COORDINATOR | | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | 619K| 14M| | 3227 (3)| 00:00:39 | Q1,02 | P->S | QC (RAND) |
| 3 | SORT GROUP BY | | 619K| 14M| 649M| 3227 (3)| 00:00:39 | Q1,02 | PCWP | |
|* 4 | HASH JOIN | | 20M| 471M| | 3167 (1)| 00:00:39 | Q1,02 | PCWP | |
| 5 | PX RECEIVE | | 1181K| 11M| | 53 (0)| 00:00:01 | Q1,02 | PCWP | |
| 6 | PX SEND HASH | :TQ10000 | 1181K| 11M| | 53 (0)| 00:00:01 | Q1,00 | P->P | HASH |
| 7 | PX BLOCK ITERATOR | | 1181K| 11M| | 53 (0)| 00:00:01 | Q1,00 | PCWC | |
| 8 | INDEX FAST FULL SCAN| PK_ICME_STUDENT | 1181K| 11M| | 53 (0)| 00:00:01 | Q1,00 | PCWP | |
| 9 | PX RECEIVE | | 20M| 274M| | 3107 (1)| 00:00:38 | Q1,02 | PCWP | |
| 10 | PX SEND HASH | :TQ10001 | 20M| 274M| | 3107 (1)| 00:00:38 | Q1,01 | P->P | HASH |
| 11 | PX BLOCK ITERATOR | | 20M| 274M| | 3107 (1)| 00:00:38 | Q1,01 | PCWC | |
| 12 | INDEX FAST FULL SCAN| UK_NOPROJECT_SCORE | 20M| 274M| | 3107 (1)| 00:00:38 | Q1,01 | PCWP | |
-------------------------------------------------------------------------------------------------------------------------------
Statistics
----------------------------------------------------------
181 recursive calls
3 db block gets
226981 consistent gets
再加上ORDERED提示
SQL> select /*+ ordered */ count(*) total,count(distinct stu.ic_code) hav_sc_stucnt,sum(ns.class_score),stu.ic_code from icme_student stu
join icme_noproject_score ns on stu.ic_code=ns.ic_code
3 group by stu.ic_code;
854409 rows selected.
Elapsed: 00:00:16.88
Execution Plan
----------------------------------------------------------
Plan hash value: 4250446769
---------------------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | TQ |IN-OUT| PQ Distrib |
---------------------------------------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 619K| 14M| | 3227 (3)| 00:00:39 | | | |
| 1 | PX COORDINATOR | | | | | | | | | |
| 2 | PX SEND QC (RANDOM) | :TQ10002 | 619K| 14M| | 3227 (3)| 00:00:39 | Q1,02 | P->S | QC (RAND) |
| 3 | SORT GROUP BY | | 619K| 14M| 649M| 3227 (3)| 00:00:39 | Q1,02 | PCWP | |
|* 4 | HASH JOIN | | 20M| 471M| | 3167 (1)| 00:00:39 | Q1,02 | PCWP | |
| 5 | PX RECEIVE | | 1181K| 11M| | 53 (0)| 00:00:01 | Q1,02 | PCWP | |
| 6 | PX SEND HASH | :TQ10000 | 1181K| 11M| | 53 (0)| 00:00:01 | Q1,00 | P->P | HASH |
| 7 | PX BLOCK ITERATOR | | 1181K| 11M| | 53 (0)| 00:00:01 | Q1,00 | PCWC | |
| 8 | INDEX FAST FULL SCAN| PK_ICME_STUDENT | 1181K| 11M| | 53 (0)| 00:00:01 | Q1,00 | PCWP | |
| 9 | PX RECEIVE | | 20M| 274M| | 3107 (1)| 00:00:38 | Q1,02 | PCWP | |
| 10 | PX SEND HASH | :TQ10001 | 20M| 274M| | 3107 (1)| 00:00:38 | Q1,01 | P->P | HASH |
| 11 | PX BLOCK ITERATOR | | 20M| 274M| | 3107 (1)| 00:00:38 | Q1,01 | PCWC | |
| 12 | INDEX FAST FULL SCAN| UK_NOPROJECT_SCORE | 20M| 274M| | 3107 (1)| 00:00:38 | Q1,01 | PCWP | |
---------------------------------------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
4 - access("NS"."IC_CODE"="STU"."IC_CODE")
Statistics
----------------------------------------------------------
117 recursive calls
3 db block gets
226981 consistent gets
对开始CBO默认解析做10053 trace
出于篇幅的原因也只是理出一些cost结果,如果要学习分析可以参考书的14章
语法
current session:
alter session events '10053 trace name context forever,level <1|2>';
execute sql
alter session events '10053 trace name context off';
other session:
exec dbms_system.set_ev(,,10053,<1|2>,'');
execute sql
exec dbms_system.set_ev(,,10053,0,'');
trace的内容结构是:
OPTIMIZER PARAMETERS:优化器参数
SYSTEM STATITICS:OS系统的统计信息如IO速度和CPU速度;
TABLE STATITISCS:表对象统计信息如行数,块数,平均行长度,有无直方图
INDEX STATISTICS:表上索引的统计信息如层数,叶子块数,distinct 键值数,聚集因子
COLUMNS STATISTICS:表信息
1,sigle table access cost: 单个表的连接方式的各种cost,如全表扫描(TABLE SCAN),最适合的索引读取方式IFFS\IFS\RANGE SCAN\INDEX AND-Equal,根据最小cost确定的连接方式
如
******** End index join costing ********
Best:: AccessPath: IndexFFS Index: PK_ICME_STUDENT
Cost: 955.76 Degree: 1 Resp: 955.76 Card: 1179881.00 Bytes: 0
***************************************
******** End index join costing ********
Best:: AccessPath: IndexFFS Index: UK_NOPROJECT_SCORE
Cost: 47986.14 Degree: 1 Resp: 47986.14 Card: 20565707.00 Bytes: 0
Grouping column cardinality [ IC_CODE] 1179881
***************************************
2,join order and method cost: 基于cardinality的join连接方式,CBO会考虑用不同的table join order(顺序) 选择outer table(外部表) 和inner table(内部表) NL\SM\HA JOIN 三种连接方式分别计算代价(外部表每返回一行都会去内部表去执行一次,所以在带统计信息的执行计划中的Starts列 PK_ICME_STUDENT 执行了 UK_NOPROJECT_SCORE 的行数次,且每次ICME_STUDENT 的e-rows中为什么会是1).
NL = NESTED JOIN
SM = SORT MERGE JOIN
HA = HASH JOIN
***********************
Join order[1]: ICME_STUDENT[STU]#0 ICME_NOPROJECT_SCORE[NS]#1
***************
Best NL cost: 1410028.31
SM cost: 155683.05
HA cost: 75642.10
Best:: JoinMethod: Hash
Cost: 129928.23 Degree: 1 Resp: 129928.23 Card: 20565707.00 Bytes: 24
***********************
Join order[2]: ICME_NOPROJECT_SCORE[NS]#1 ICME_STUDENT[STU]#0
***************
Best NL cost: 66742.49
SM cost: 155683.05
HA cost: 75642.10
Best:: JoinMethod: NestedLoop
Cost: 121028.62 Degree: 1 Resp: 121028.62 Card: 20565707.00 Bytes: 24
note:这里的Cost121028.62 是整个执行计划的总和,也就是执行计划图表中ID为0的cost 121K
***********************
Best so far: Table#: 1 cost: 47986.1401 card: 20565707.0000 bytes: 287919898
Table#: 0 cost: 121028.6238 card: 20565707.0000 bytes: 493576968
Number of join permutations tried: 2
*********************************
(newjo-save) [0 1 ]
GROUP BY cardinality: 608125.00, TABLE cardinality: 20565707.00
SORT resource Sort statistics
Sort width: 1198 Area size: 1048576 Max Area size: 209715200
Degree: 1
Blocks to Sort: 93161 Row size: 37 Total Rows: 20565707
Initial runs: 4 Merge passes: 1 IO Cost / pass: 50464
Total IO sort cost: 53219 Total CPU sort cost: 23713302516
Total Temp space used: 680231000
Final - All Rows Plan: Best join order: 2 --这里给出了终级JOIN 表顺序为2
Cost: 121028.6238 Degree: 1 Card: 20565707.0000 Bytes: 493576968
Resc: 121028.6238 Resc_io: 105627.0000 Resc_cpu: 342247398504
Resp: 121028.6238 Resp_io: 105627.0000 Resc_cpu: 342247398504
对于NL JOIN居然 ICME_STUDENT做outer table的cost是1410028.31,远远高出ICME_NOPROJECT_SCORE 做outer table 的66742.49
要学习10053 trace file 可以参考《Cost Based Oracle Fundamentals》书的14章,