PostgreSQL/openGauss explain解析(二): indexonlyscan cost
PostgreSQL系(openGAUSS)数据库中的所有索引都是二级索引, 数据表段( heap)和索引段(index)分别存储,通常对于多列表的SQL只返回或where中仅少量的列时,希望可以只从索引中检索,而不用再从索引回表返回数据(本篇不考虑可见性)提高查询效率,像在oracle中有index full scan和index fast full scan的执行计划,在Postgresql中也支持Btree index的indexonlyscan, MySQL中同样支持,但发现PostGreSQL默认配置的SQL优化器通常判断索引的cost大于表扫描,导致仅查询索引列也未使用索引,这里简单测试。
openGauss
创建测试表
anbob=# select version(); version ------------------------------------------------------------------------------------------------------------------------------------------------------ (openGauss 5.0.0 build a07d57c3) compiled at 2023-03-29 03:07:56 commit 0 last mr on x86_64-unknown-linux-gnu, compiled by g++ (GCC) 7.3.0, 64-bit (1 row) anbob=# create table testa(id int,name varchar(20) not null); CREATE TABLE anbob=# insert into testa select x,'anbob'||x from generate_series(1,100000) as x; INSERT 0 100000 anbob=# create index idx_name on testa(name); CREATE INDEX anbob=# analyze testa; ANALYZE anbob=# vacuum testa; VACUUM anbob=# select relkind,relname,reltuples,relpages from pg_class where relname='testa'; relkind | relname | reltuples | relpages ---------+---------+-----------+---------- r | testa | 200000 | 1082 (1 row)
Note:
这里创建了testa并且插入了10W记录,在name列创建了索引
查看执行计划
anbob=# explain (analyze,buffers) select a.name from testa a;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Seq Scan on testa a (cost=0.00..3082.00 rows=200000 width=10) (actual time=0.007..38.215 rows=200000 loops=1)
(Buffers: shared hit=1082)
Total runtime: 61.343 ms
(3 rows)
Note:
这里仅查询索引name列,发现使用的还是seq scan. 前面已做vacuum。我觉应该是会走索引扫描,因为索引里包含所有列.
seq scan 计算COST
anbob=# \! sh show cost allocate_mem_cost | 0 | Sets the planner's estimate of the cost of allocate memory. autovacuum_vacuum_cost_delay | 20ms | Vacuum cost delay in milliseconds, for autovacuum. autovacuum_vacuum_cost_limit | -1 | Vacuum cost amount available before napping, for autovacuum. codegen_cost_threshold | 10000 | Decided to use LLVM optimization or not. cost_param | 0 | Bitmap controls the use of alternative cost model. cost_weight_index | 1 | Sets the planner's discount when evaluating index cost. cpu_index_tuple_cost | 0.005 | Sets the planner's estimate of the cost of processing each index entry during an index scan. cpu_operator_cost | 0.0025 | Sets the planner's estimate of the cost of processing each operator or function call. cpu_tuple_cost | 0.01 | Sets the planner's estimate of the cost of processing each tuple (row). enable_change_hjcost | off | Enable change hash join cost qrw_inlist2join_optmode | cost_base | Specify inlist2join opimitzation mode. random_page_cost | 4 | Sets the planner's estimate of the cost of a nonsequentially fetched disk page. resource_track_cost | 100000 | Sets the minimum cost to do resource track. seq_page_cost | 1 | Sets the planner's estimate of the cost of a sequentially fetched disk page. vacuum_cost_delay | 0 | Vacuum cost delay in milliseconds. vacuum_cost_limit | 200 | Vacuum cost amount available before napping. vacuum_cost_page_dirty | 20 | Vacuum cost for a page dirtied by vacuum. vacuum_cost_page_hit | 1 | Vacuum cost for a page found in the buffer cache. vacuum_cost_page_miss | 10 | Vacuum cost for a page not found in the buffer cache.
对于上面的全表查询COST 计算方法,依赖seq_page_cost和cpu_tuple_cost
Total cost of Seq Scan
= (estimated sequential page reads * seq_page_cost) + (estimated rows returned * cpu_tuple_cost)
= (1082 * 1) + (200000 * 0.01)
= 1082 + 2000.00
= 3082
如果增加where条件 COST
anbob=# explain (analyze,buffers) select a.name from testa a where id<10000;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Seq Scan on testa a (cost=0.00..3582.00 rows=20111 width=10) (actual time=0.022..37.919 rows=19998 loops=1)
Filter: (id < 10000)
Rows Removed by Filter: 180002
(Buffers: shared hit=1082)
Total runtime: 40.077 ms
(5 rows)
Note:
这里增加了where条件返回更少的行,但依旧是扫描了所有的行,COST并没有降低反而增加,因为CPU需要对where条件返回记录需要过滤,增加了cpu_operator_cost
Total cost of Seq Scan with WHERE
= (estimated sequential page reads * seq_page_cost) + (estimated rows returned * cpu_tuple_cost) + (estimated rows returned* cpu_operator_cost)
= (1082 * 1) + (200000 * 0.01) + (200000 * 0.0025)
= 1082 + 2000.00 + 500
= 3582
使用 indexonlyscan SQL HINT COST
anbob=# select relkind,relname,reltuples,relpages from pg_class where relname='idx_name'; relkind | relname | reltuples | relpages ---------+----------+-----------+---------- i | idx_name | 200000 | 773 (1 row) anbob=# explain (analyze,buffers) select /*+indexonlyscan(a idx_name) */ a.name from testa a; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- [Bypass] Index Only Scan using idx_name on testa a (cost=0.00..6092.25 rows=200000 width=10) (actual time=0.019..43.794 rows=200000 loops=1) Heap Fetches: 0 (Buffers: shared hit=770) Total runtime: 66.558 ms (5 rows)
Note:
这里使用indexonlyscan的COST要比cost高出很多,heap pages 1082,index pages 773这确实不太好理解。我们看一下index only scan的COST计算(I guass,not sure),注意在index scan时使用的是random IO, 所以COST使用的random_page_cost,及扫索引的cpu_index_tuple_cost.
(estimated index only scan page reads * random_page_cost) + (estimated rows returned* cpu_tuple_cost) + (estimated rows returned * cpu_index_tuple_cost)
=(773*4 )+(200000 * 0.01) +(200000* 0.005)
=3092 + 2000 + 1000
=6092
random_page_cost参数
因为random_page_cost默认为4,而seq_page_cost默认为1,这也是cost相差较大的原因,运行在SSD上的实例,random_page_cost设置为1.0~2.0之间,虽然服务器允许将random_page_cost设置的比seq_page_cost小,即使访问数据都在内存一般也只建议两者配置相等,可以对表空间级配置,相对于seq_page_cost,减少这个值将导致系统更倾向于使用索引扫描
减少random_page_cost和random_page_cost
既然推算出了cost的算法,我们常调小参数看是否不用hint使用索引
anbob=# set random_page_cost=1; SET anbob=# explain (analyze,buffers)select /*+indexonlyscan(a idx_name) */ a.name from testa a; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- [Bypass] Index Only Scan using idx_name on testa a (cost=0.00..3773.25 rows=200000 width=10) (actual time=0.019..44.648 rows=200000 loops=1) Heap Fetches: 0 (Buffers: shared hit=770) Total runtime: 67.601 ms (5 rows) anbob=# set cpu_index_tuple_cost=0.004; SET anbob=# explain (analyze,buffers)select /*+indexonlyscan(a idx_name) */ a.name from testa a; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- [Bypass] Index Only Scan using idx_name on testa a (cost=0.00..3573.25 rows=200000 width=10) (actual time=0.020..43.095 rows=200000 loops=1) Heap Fetches: 0 (Buffers: shared hit=770) Total runtime: 72.249 ms (5 rows) # 恢复默认值 anbob=# set cpu_index_tuple_cost=0.005; SET # 但为测试效果, 调整random_page_cost小于seq_page_cost anbob=# set random_page_cost=0.1; SET anbob=# explain (analyze,buffers)select /*+indexonlyscan(a idx_name) */ a.name from testa a; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- [Bypass] Index Only Scan using idx_name on testa a (cost=0.00..3077.55 rows=200000 width=10) (actual time=0.019..42.693 rows=200000 loops=1) Heap Fetches: 0 (Buffers: shared hit=770) Total runtime: 66.196 ms (5 rows)
Note:
现在的COST 小于seq scan 的3082 , 我们去掉HINT试试
anbob=# explain (analyze,buffers)select a.name from testa a;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
[Bypass]
Index Only Scan using idx_name on testa a (cost=0.00..3077.55 rows=200000 width=10) (actual time=0.019..45.700 rows=200000 loops=1)
Heap Fetches: 0
(Buffers: shared hit=770)
Total runtime: 68.124 ms
(5 rows)
Note:
目前不使用hint,优化器已经可以使用index only scan, 这个从上面的公式看,除了random_page_cost原因,还就是表的列太小了,所以在pages上相差不大,下面我们创建一个相对宽点的表,是否可以使用索引。
openGauss=# create table testc(id int,name varchar(20),addr varchar(3000)); CREATE TABLE openGauss=# insert into testc select x,'anbob'||x,rpad('x',2000,'x') from generate_series(1,10000) as x; INSERT 0 10000 openGauss=# create index idx_testc_name on testc(name); CREATE INDEX openGauss=# vacuum analyze testc; VACUUM openGauss=# explain analyze select name from testc; QUERY PLAN --------------------------------------------------------------------------------------------------------- Seq Scan on testc (cost=0.00..203.00 rows=10000 width=9) (actual time=0.008..1.922 rows=10000 loops=1) Total runtime: 2.920 ms (2 rows) openGauss=# set random_page_cost=1; SET openGauss=# explain analyze select name from testc; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- [Bypass] Index Only Scan using idx_testc_name on testc (cost=0.00..191.25 rows=10000 width=9) (actual time=0.016..1.994 rows=10000 loops=1) Heap Fetches: 0 Total runtime: 2.893 ms (4 rows)
NOTE:
现在仅把random_page_cost调整和seq_page_cost相等, 默认已经可以使用index only scan.,注意创建的testc的name列并没有非空约事,使用 index only scan, 那null也在index中记录?
index only scan对于null
anbob=# insert into testc values(-1,null,null); INSERT 0 1 anbob=# explain analyze select name from testc; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------- [Bypass] Index Only Scan using idx_testc_name on testc (cost=0.00..191.25 rows=10000 width=9) (actual time=0.026..2.473 rows=10001 loops=1) Heap Fetches: 107 Total runtime: 3.554 ms (4 rows) anbob=# select count(*),count(name) from testc; count | count -------+------- 10001 | 10000 (1 row) anbob=# explain analyze select name from testc where name is null; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------- [Bypass] Index Only Scan using idx_testc_name on testc (cost=0.00..1.27 rows=1 width=9) (actual time=0.013..0.014 rows=1 loops=1) Index Cond: (name IS NULL) Heap Fetches: 1 Total runtime: 0.073 ms (5 rows)
Note:
说明在POSTGRESQL中对于null是可以使用index only scan的。
以上在postgresql和opengauss测试相同,下面看看其他数据库。
MySQL
MYSQL_root@127.0.0.1 [anbob]> select version();
+-------------------+
| version() |
+-------------------+
| 8.0.20-commercial |
+-------------------+
1 row in set (0.00 sec)
MYSQL_root@127.0.0.1 [anbob]> create table testa(id int,name varchar(20));
Query OK, 0 rows affected (0.06 sec)
MYSQL_root@127.0.0.1 [anbob]> insert into testa
-> WITH RECURSIVE cte (n) AS
-> (
-> SELECT 1
-> UNION ALL
-> SELECT n + 1 FROM cte WHERE n < 100000 -> )
-> SELECT n,concat('anbob',n) name FROM cte;
ERROR 3636 (HY000): Recursive query aborted after 100000 iterations. Try increasing @@cte_max_recursion_depth to a larger value.
MYSQL_root@127.0.0.1 [anbob]> show variables like 'cte%';
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| cte_max_recursion_depth | 1000 |
+-------------------------+-------+
1 row in set (0.01 sec)
MYSQL_root@127.0.0.1 [anbob]> set cte_max_recursion_depth=100000;
Query OK, 0 rows affected (0.00 sec)
MYSQL_root@127.0.0.1 [anbob]> insert into testa
-> WITH RECURSIVE cte (n) AS
-> (
-> SELECT 1
-> UNION ALL
-> SELECT n + 1 FROM cte WHERE n < 100000 -> )
-> SELECT n,concat('anbob',n) name FROM cte;
Query OK, 100000 rows affected (1.89 sec)
Records: 100000 Duplicates: 0 Warnings: 0
MYSQL_root@127.0.0.1 [anbob]> create index idx_testa_name on testa(name);
Query OK, 0 rows affected (1.88 sec)
Records: 0 Duplicates: 0 Warnings: 0
MYSQL_root@127.0.0.1 [anbob]> explain format=tree select name from testa;
+--------------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------------+
| -> Index scan on testa using idx_testa_name (cost=10071.15 rows=99989) |
+--------------------------------------------------------------------------+
1 row in set (0.00 sec)
Note:
在MySQL中2列mysql依旧使用了index scan. 在MYSQL中同样存储null值,所以也可以不需要null约束。
Oracle
SQL> @desc testb Name Null? Type ------------------------------- -------- ---------------------------- 1 ID NUMBER(38) 2 NAME NOT NULL VARCHAR2(20) 3 ADDR VARCHAR2(3000) SQL> explain plan for select /*+gather_plan_statistics*/ name from testb; Explained. SQL> @x2 PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------------- Plan hash value: 1149720753 --------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 1074K| 85 (0)| 00:00:01 | | 1 | INDEX FAST FULL SCAN| IDX_TESTB_NAME | 100K| 1074K| 85 (0)| 00:00:01 | --------------------------------------------------------------------------------------- 8 rows selected. SQL> alter table testb modify name null; Table altered. SQL> explain plan for select /*+gather_plan_statistics*/ name from testb; Explained. SQL> @x2 PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 4088136327 --------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | --------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 100K| 1074K| 13557 (1)| 00:00:01 | | 1 | TABLE ACCESS FULL| TESTB | 100K| 1074K| 13557 (1)| 00:00:01 | --------------------------------------------------------------------------- 8 rows selected.
Note:
注意在oracle中需要name列有not null约束,因为在oracle中index是不记录null的,当有not null约束时可以使用index fast full scan多块读的方式更高效率的访问索引。
有兴趣的可以去阅读postgresql源码 关于cost计算 costsize.c
对不起,这篇文章暂时关闭评论。