PostgreSQL/openGauss explain解析(三): Heap Fetches
在上一篇中提到了indexonlyscan, 在它执行计划中可以看到有一行Heap Fetches,这篇主要记录一下它的含义。因为Postgresql系的MVCC实现原理,索引中不存在可见性映射(Visibility information),在PostgreSQL中的indexonlyscan 也并不总是scan index only, 简而言之就是如果表(heap)的数据没有对应可见性映射文件(table’s visibility map.)或不是全部完全可见,indexonlyscan的执行计划还是要回表(heap)去检查数据,回表数据记录在heap fetches. 下面简单的记录,经测试在opengauss和postgresql中还有一小差异,下文提到的postgresql默认也表示opengauss,实际测试在opengauss 5.0,如有不同会特殊说明。
背景
在postgresql中支持index scan和indexonlyscan 索引访问路径, PostgresSQL 仍然需要确保该行在返回它之前是可见的,并且该信息在table heap上,索引不包含可见性信息. 为什么 EXPLAIN 不显示index scan的heap fetches? 是因为index scan从heap中获取每一行,而indexonlyscan不是的。 如果索引类型支持indexonlyscan并且查询列在索引中,则查询所需的所有数据值都可以从索引中获得,因此仅索引扫描在物理上是可能的。 但是,PostgreSQL中的任何表扫描都有一个额外的要求:它必须验证每个检索到的行对查询的MVCC快照是否“可见”,可见性信息不存储在索引条目中,仅存储在堆条目中,似乎每行检索都需要堆访问。有一种方法可以解决此问题,PostgreSQL 跟踪表堆中每个页面,该页面中存储的所有行是否足够旧,以便对所有当前和未来的事务可见。此信息存储在表的可见性地图中的位中。仅索引扫描在找到候选索引条目后,会检查相应堆页的可见性映射位。如果已设置,则该行已知可见,因此无需进一步工作即可返回数据。如果未设置,则必须访问堆条目以了解它是否可见,因此与标准索引扫描相比,不会获得任何性能优势。即使在成功的情况下,这种方法也用可见性映射访问换取堆访问;但是,由于可见性映射比它描述的堆小四个数量级,因此访问它所需的物理 I/O 要少得多。在大多数情况下,可见性地图始终缓存在内存中。
对于大部分应用场景,确实存在仅仅少量数据的更新,这种表上绝大部分都可以设置了可见映射位,索引这种扫描方式较小的heap fetches, 可以从中收益。 并且索引列是尽可能改动少的列。
对于index only scan,Postgres 需要首先读取并评估可见性地图(visibility map )来绕过特定行的heap,该可见性地图记录了每个页面最近是否被更改。如果页面发生了变化, 在index only scan期间,Postgres 必须在表heap中查找的行数,而不是索引。就像我们进行正常的索引扫描一样,事实上,index only scan比index scan稍微糟糕一些,因为我们添加了对可见性地图的额外检查。有时可以通过Vacuum或调整Autovacuum设置来减少heap fetches,以使可见性地图保持最新状态。使用更小的百分比、更宽的表行、更高的基数更希望使用索引从中优于SEQ SCAN,页面中存储的所有行是否足够旧,以便对所有当前和未来的事务可见.
测试 –env openGauss 5.0
创建测试表
anbob=# create table testb as select * from testa where rownum<=50000; INSERT 0 50000 anbob=# create index idx_testb_name on testb(name); CREATE INDEX anbob=# select relkind,relname,reltuples,relpages from pg_class where relname='testb'; relkind | relname | reltuples | relpages ---------+---------+-----------+---------- r | testb | 50000 | 271 (1 row) anbob=# explain (analyze,buffers)select name from testb; QUERY PLAN ---------------------------------------------------------------------------------------------------------- Seq Scan on testb (cost=0.00..771.00 rows=50000 width=58) (actual time=0.015..9.187 rows=50000 loops=1) (Buffers: shared hit=271) Total runtime: 15.831 ms (3 rows)
对于新创建的表没有做过vacuum或auto vacuum默认没有可见性映射文件(_vm). heap seq scan也就771 cost。
查看文件系统是否有vm文件
anbob=# select 'testb'::regclass::oid; oid ------- 16490 (1 row) anbob=# select pg_relation_filepath('testb'); pg_relation_filepath ---------------------- base/16385/16490 (1 row) og@oel7db1 16385]$ ls -l 16490* -rw------- 1 og og 2220032 Jun 10 18:10 16490 -rw------- 1 og og 24576 Jun 10 18:10 16490_fsm [og@oel7db1 16385]$
注意这个表有free space map文件(_fsm)但是没有visibility map文件(_vm),如果没有可见性映射,PostgreSQL 无法知道该页面中的所有行是否对所有当前事务可见,因此必须访问堆(heap)以获取该信息。
使用index only scan
anbob=# select relkind,relname,reltuples,relpages from pg_class where relname='idx_testb_name'; relkind | relname | reltuples | relpages ---------+----------------+-----------+---------- i | idx_testb_name | 50000 | 194 (1 row) anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..2610.25 rows=50000 width=58) (actual time=0.148..49.640 rows=50000 loops=1) Heap Fetches: 50000 (Buffers: shared hit=9726 read=194) Total runtime: 56.483 ms (5 rows) anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..2610.25 rows=50000 width=58) (actual time=0.039..26.424 rows=50000 loops=1) Heap Fetches: 50000 (Buffers: shared hit=9919) Total runtime: 32.719 ms (5 rows)
Note:
注意在使用hint indexonlyscan 后total runtime和COST都远高于seq scan, 并且在两次执行后,buffers命中比较高的前提下。
VACUUM表
anbob=# vacuum testb; VACUUM [og@oel7db1 16385]$ ls -l 16490* -rw------- 1 og og 2220032 Jun 10 18:14 16490 -rw------- 1 og og 24576 Jun 10 18:14 16490_fsm -rw------- 1 og og 8192 Jun 10 18:14 16490_vm anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..1526.25 rows=50000 width=58) (actual time=0.021..8.973 rows=50000 loops=1) Heap Fetches: 0 (Buffers: shared hit=194) Total runtime: 13.771 ms (5 rows)
note:
在做完Vacuum后, heap fetches有原来的52k降为0. 说明不再做回表查询, 一个真正的index only scan(尽管visibility map总是扫描), cost与runtime也明显降低得到性能提升。
pg_visibility插件
postgres=# create extension pg_visibility; CREATE EXTENSION postgres=# select pg_visibility_map('testb'::regclass, 0); pg_visibility_map ------------------- (t,f) (1 row)
NOTE:
pg_visibility插件提供了一种检查表的可见性映射 (VM) 和页面级可见性信息的方法。 它还提供了检查可见性地图完整性并强制重建它的功能。 这第1个“t”,表示true,意味着所有可见。
Vacuum对heap fetche在openGauss和PostgreSQL上的不同
测试均在一个空闲的数据库,只有测试1个session。
# openGAUSS 5
anbob=# \d+ testb Table "public.testb" Column | Type | Modifiers | Storage | Stats target | Description --------+-----------------------+-----------+----------+--------------+------------- id | integer | | plain | | name | character varying(20) | | extended | | Indexes: "idx_testb_name" btree (name) TABLESPACE pg_default Has OIDs: no Options: orientation=row, compression=no anbob=# update testb set id=id+1 where rownum<=1000; UPDATE 1000 anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..1498.37 rows=48141 width=10) (actual time=0.068..12.791 rows=50000 loops=1) Heap Fetches: 2028 (Buffers: shared hit=2194 dirtied=8) Total runtime: 18.491 ms (5 rows) anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..1498.37 rows=48141 width=10) (actual time=0.019..10.592 rows=50000 loops=1) Heap Fetches: 2028 (Buffers: shared hit=2194) Total runtime: 15.612 ms (5 rows) anbob=# SELECT schemaname,relname,last_autovacuum,last_autoanalyze,autovacuum_count,autoanalyze_count,last_vacuum,vacuum_count FROM pg_stat_all_tables where relname='testb'; schemaname | relname | last_autovacuum | last_autoanalyze | autovacuum_count | autoanalyze_count | last_vacuum | vacuum_count ------------+---------+-----------------+-------------------------------+------------------+-------------------+-------------------------------+-------------- public | testb | | 2023-06-10 18:19:16.718103+08 | 0 | 1 | 2023-06-13 09:39:14.157991+08 | 18 (1 row)
Note:
做DML后,index only scan 有2028的heap fetches, 多次查询并不会清理。
anbob=# vacuum verbose testb; -- 第一次vacuum INFO: vacuuming "public.testb"(oel7db1 pid=1848) INFO: index "idx_testb_name" now contains 51000 row versions in 194 pages(oel7db1 pid=1848) DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "testb": found 0 removable, 2028 nonremovable row versions in 11 out of 276 pages(oel7db1 pid=1848) DETAIL: 1000 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..1539.36 rows=48250 width=10) (actual time=0.119..14.011 rows=50000 loops=1) Heap Fetches: 2028 (Buffers: shared hit=2195 dirtied=49) Total runtime: 19.288 ms (5 rows) anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..1539.36 rows=48250 width=10) (actual time=0.019..10.869 rows=50000 loops=1) Heap Fetches: 1028 (Buffers: shared hit=201) Total runtime: 16.209 ms (5 rows) anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..1539.36 rows=48250 width=10) (actual time=0.020..11.415 rows=50000 loops=1) Heap Fetches: 1028 (Buffers: shared hit=201) Total runtime: 18.165 ms (5 rows) anbob=# select relkind,relname,reltuples,relpages from pg_class where relname='testb'; relkind | relname | reltuples | relpages ---------+---------+-----------+---------- r | testb | 48250 | 276 (1 row) anbob=# vacuum verbose testb; -- 第二次vacuum INFO: vacuuming "public.testb"(oel7db1 pid=1848) INFO: scanned index "idx_testb_name" to remove 1000 row versions(oel7db1 pid=1848) DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: index "idx_testb_name" now contains 50000 row versions in 194 pages(oel7db1 pid=1848) DETAIL: 1000 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "testb": found 0 removable, 1028 nonremovable row versions in 11 out of 276 pages(oel7db1 pid=1848) DETAIL: 0 dead row versions cannot be removed yet. There were 75 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuum 1663/16385/16490, "testb": removed 1000 row versions in 6 pages(oel7db1 pid=1848) DETAIL: CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..1508.26 rows=47355 width=10) (actual time=0.034..8.768 rows=50000 loops=1) Heap Fetches: 103 (Buffers: shared hit=196) Total runtime: 14.308 ms (5 rows) anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..1508.26 rows=47355 width=10) (actual time=0.017..10.182 rows=50000 loops=1) Heap Fetches: 103 (Buffers: shared hit=195) Total runtime: 16.030 ms (5 rows) anbob=# vacuum verbose testb; -- 第三次vacuum INFO: vacuuming "public.testb"(oel7db1 pid=1848) INFO: index "idx_testb_name" now contains 50000 row versions in 194 pages(oel7db1 pid=1848) DETAIL: 0 index row versions were removed. 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: "testb": found 0 removable, 103 nonremovable row versions in 6 out of 276 pages(oel7db1 pid=1848) DETAIL: 0 dead row versions cannot be removed yet. There were 1075 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM anbob=# explain (analyze,buffers)select /*+indexonlyscan(b idx_testb_name)*/ name from testb b; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- [Bypass] Index Only Scan using idx_testb_name on testb b (cost=0.00..1472.68 rows=46429 width=10) (actual time=0.027..9.967 rows=50000 loops=1) Heap Fetches: 0 (Buffers: shared hit=194) Total runtime: 15.442 ms (5 rows) anbob=# SELECT schemaname,relname,last_autovacuum,last_autoanalyze,autovacuum_count,autoanalyze_count,last_vacuum,vacuum_count FROM pg_stat_all_tables where relname='testb'; schemaname | relname | last_autovacuum | last_autoanalyze | autovacuum_count | autoanalyze_count | last_vacuum | vacuum_count ------------+---------+-----------------+-------------------------------+------------------+-------------------+-------------------------------+-------------- public | testb | | 2023-06-10 18:19:16.718103+08 | 0 | 1 | 2023-06-13 10:21:10.874028+08 | 21 (1 row) anbob=# anbob=# \! sh show vacuum autovacuum | on | Starts the autovacuum subprocess. autovacuum_analyze_scale_factor | 0.1 | Number of tuple inserts, updates, or deletes prior to analyze as a fraction of reltuples. autovacuum_analyze_threshold | 50 | Minimum number of tuple inserts, updates, or deletes prior to analyze. autovacuum_freeze_max_age | 4000000000 | Age at which to autovacuum a table. autovacuum_io_limits | -1 | Sets io_limit for autovacum. autovacuum_max_workers | 3 | Sets the maximum number of simultaneously running autovacuum worker processes. autovacuum_mode | mix | Sets the behavior of autovacuum autovacuum_naptime | 10min | Time to sleep between autovacuum runs. autovacuum_vacuum_cost_delay | 20ms | Vacuum cost delay in milliseconds, for autovacuum. autovacuum_vacuum_cost_limit | -1 | Vacuum cost amount available before napping, for autovacuum. autovacuum_vacuum_scale_factor | 0.2 | Number of tuple updates or deletes prior to vacuum as a fraction of reltuples. autovacuum_vacuum_threshold | 50 | Minimum number of tuple updates or deletes prior to vacuum. enable_debug_vacuum | off | This parameter is just used for logging some vacuum info. log_autovacuum_min_duration | -1 | Sets the minimum execution time above which autovacuum actions will be logged. vacuum_cost_delay | 0 | Vacuum cost delay in milliseconds. vacuum_cost_limit | 200 | Vacuum cost amount available before napping. vacuum_cost_page_dirty | 20 | Vacuum cost for a page dirtied by vacuum. vacuum_cost_page_hit | 1 | Vacuum cost for a page found in the buffer cache. vacuum_cost_page_miss | 10 | Vacuum cost for a page not found in the buffer cache. vacuum_defer_cleanup_age | 0 | Number of transactions by which VACUUM and HOT cleanup should be deferred, if any. vacuum_freeze_min_age | 2000000000 | Minimum age at which VACUUM should freeze a table row. vacuum_freeze_table_age | 4000000000 | Age at which VACUUM should scan whole table to freeze tuples. vacuum_gtt_defer_check_age | 10000 | The defer check age of GTT, used to check expired data after vacuum.
Note:
在openGauss中测试是做了3次vacuum才把heap fetches降到0. 暂时未找到原因,后期了解后再更新。(update – 2024/04- 朋友反馈vacuum analyze 1次就可以,你可以试试)
# PostgreSQL 13
anbob=# select version(); version ------------------------------------------------------------ PostgreSQL 13.1, compiled by Visual C++ build 1914, 64-bit (1 行记录) anbob=# update testc set id=id+1 where id<10000; UPDATE 19996 anbob=# explain analyze select name from testc; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Index Only Scan using idx_testc_name on testc (cost=0.42..1769.59 rows=96178 width=10) (actual time=0.864..35.785 rows=110000 loops=1) Heap Fetches: 40059 Planning Time: 1.617 ms Execution Time: 38.599 ms (4 行记录) anbob=# vacuum verbose testc; 信息: 正在清理 (vacuum) "public.testc" 信息: 索引"idx_testc_name"在653个页中包含了行版本号108137 描述: 索引行版本0被删除. 0个索引页已经被删除,0当前可重用. CPU:用户:0.00 s,系统:0.00 s,已用时间:0.00 s. 信息: "testc": 在超出1340页的1中找到可删除版本号0, 不可删除的版本号19 描述: 0的死亡行版本还不能被移除,最老的xmin: 808 有60个未用的项标识符. 由于缓冲区占用而跳过0个页面,204个冻结页面. 0 个页面完全为空。 CPU:用户:0.00 s,系统:0.00 s,已用时间:0.07 s 信息: 正在清理 (vacuum) "pg_toast.pg_toast_73976" 信息: 索引"pg_toast_73976_index"在1个页中包含了行版本号0 描述: 索引行版本0被删除. 0个索引页已经被删除,0当前可重用. CPU:用户:0.00 s,系统:0.00 s,已用时间:0.00 s. 信息: "pg_toast_73976": 在超出0页的0中找到可删除版本号0, 不可删除的版本号0 描述: 0的死亡行版本还不能被移除,最老的xmin: 808 有0个未用的项标识符. 由于缓冲区占用而跳过0个页面,0个冻结页面. 0 个页面完全为空。 CPU:用户:0.00 s,系统:0.00 s,已用时间:0.00 s VACUUM anbob=# explain analyze select name from testc; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Index Only Scan using idx_testc_name on testc (cost=0.42..1975.97 rows=109937 width=10) (actual time=0.516..13.016 rows=110000 loops=1) Heap Fetches: 0 Planning Time: 80.679 ms Execution Time: 15.232 ms (4 行记录)
Note:
在PostgreSQL中一次vacuum就可以看到随后的index only scan的heap fetches就降为0.
小结:
在Postgresql系数据库中(如openGauss)index only scan并不总是仅索引, 需要扫描可见性映射文件(_vm)判断数据的可见性,如果有数据修改需要回表heap查询数据显示为heap fetches, 如果该值为0表示仅索引(和可见性映射文件)的扫描。 根据该表上发生的修改次数,PostgreSQL 很可能必须多次访问表heap,这当然会减慢SQL查询速度。 对于大多数块是静态的表,indexonlyscan非常好选择。
— 如果您发现有不对之处,请反馈给我。
对不起,这篇文章暂时关闭评论。