PostgreSQL 12 : Prepare statement和plan_cache_mode 参数
MySQL对所有的SQL都是硬解析,执行计划不能被复用。
PostgreSQL可以对SQL硬解析一次,后面再执行时复用这个执行计划,但执行计划不能在session之间共享。
Oracle有shared_pool,所有session的SQL执行计划都可以共享。
在SQL的初始解析阶段PostgreSQL和ORACLE rdbms有很多相似之处, 如开始会进行语法、语义的检查,那些元数据存在system表空间(oracle)或 system catalog(PostgreSQL). 然后根据之前预准备的统计信息(可能动态采样),CBO 会为SQL执行生成一个执行计划. 但是在解析完成后ORACLE和PG(以下用PG代替PostgreSQL)会存在一些差异,oracle会把执行计划存储在shared pool(Libaray cache)中对于所有会话可以共享,但是PG存储在program的本地内存中,只能给会话自身使用, 但是默认PG又不是总把sql plan 存储在program local memory中, 所以引入一种prepare statement.(多年前写Java 代码时里面JDK就有这个prepare statement对象相似)
PREPARE
creates a prepared statement. A prepared statement is a server-side object that can be used to optimize performance. When thePREPARE
statement is executed, the specified statement is parsed, analyzed, and rewritten. When anEXECUTE
command is subsequently issued, the prepared statement is planned and executed. This division of labor avoids repetitive parse analysis work, while allowing the execution plan to depend on the specific parameter values supplied.
简单的理解Prepare statement是一种服务器端对象,用于优化性能,prepare statement执行会解析、分析、查询重写等操作,随后使用execute命令执行就避免多次重复性的解析等工作。 在会话结束时prepare statement 会自动消失,也可以使用DEALLOCATE命令手动清除prepare statement. 多个会话可以自有的prepare stament.
用例
prepare ps1(integer) as update t_anbob set cn = cn + 1 where id = $1;
execute ps1(1);
execute ps1(2);
...
execute ps1(100000);
deallocate prepare ps1;
DBA可以使用pg_prepared_statements view查询会话中存在的所有prepare statement.
共享SQL执行计划在同一个会话执行大量类似相同的SQL时,prepare statement提升非常明显,但是对于执行不同的SQL 自然就不会有太大性能 提升。任何一样东西都有两面性,如果SQL 可以重启利用执行计划,但是对于存在数据倾斜的表,如一个表列值99% 是”男”,1%是”女”,那查询”女”是用索引效率会高,查询”男”时用全表效率就会更高,而不是利用同一执行计划。
在ORACLE中从9i R2 开始引入了bind peeking,并且后续的版本一直在增强这方面的灵活性, 如11g 中的Adaptive Cursor Sharing 和 11g R2中的Adaptive Cursor Sharing , 12c 中的Adaptive plans等, SO , as you know, 这些功能一直在有经验的DBA为追求稳定的环境时,事与愿违的禁掉了。
在PostgreSQL中即使使用了prepare statement, 也不是立即重启执行计划,也是需要执行足够的次数才会保存并重用通用(generic)的执行计划。这里又涉及到PG里的两个概念generic plan(通用执行计划)和custom plan(自定义执行计划),要检查PostgreSQL用于Prepare statement的执行计划,我们可以使用EXPLAIN。如果正在使用generic plan,它将包含参数符号$ n,而如果使用custom plan将使用当前实际参数值替换它。 在PostgreSQL 12版本中引入一个参数plan_cache_mode,有三个值auto(default) 、force_custom_plan 和 force_generic_plan ,应用于cache plan执行时。
下面还是直接演示一下区别。
[postgres@anbob19 ~]$ psql psql (12beta1) Type "help" for help. postgres=# select version(),current_setting('server_version_num'); version | current_setting ------------------------------------------------------------------------------------------------------------+----------------- PostgreSQL 12beta1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11), 64-bit | 120000 (1 row) postgres=# \x Expanded display is on. postgres=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+-----------+---------+-------+----------------------- anbob | postgres | SQL_ASCII | C | C | postgres | postgres | SQL_ASCII | C | C | template0 | postgres | SQL_ASCII | C | C | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | SQL_ASCII | C | C | =c/postgres + | | | | | postgres=CTc/postgres (4 rows) postgres=# \c anbob You are now connected to database "anbob" as user "postgres". anbob=# create table T_ANBOB as anbob-# select 1 n from generate_series(1,11) anbob-# union all select 2 from generate_series(1,22) anbob-# union all select 3 from generate_series(1,33) anbob-# union all select 4 from generate_series(1,44) anbob-# union all select 5 from generate_series(1,55) anbob-# union all select 6 from generate_series(1,66) anbob-# union all select 7 from generate_series(1,77) anbob-# union all select 8 from generate_series(1,88) anbob-# ; SELECT 396 anbob=# analyze T_ANBOB; ANALYZE anbob=# \dt List of relations Schema | Name | Type | Owner --------+---------+-------+---------- public | t_anbob | table | postgres anbob=# select count(*),count(distinct n) from T_ANBOB; count | count -------+------- 396 | 8 (1 row) anbob=# select n,count(*) from t_anbob group by n order by 1; n | count ---+------- 1 | 11 2 | 22 3 | 33 4 | 44 5 | 55 6 | 66 7 | 77 8 | 88 (8 rows) anbob=# select * from pg_stats where tablename = 't_anbob' and attname = 'n'; -[ RECORD 1 ]----------+-------------------------------------------------------------------------------------------- schemaname | public tablename | t_anbob attname | n inherited | f null_frac | 0 avg_width | 4 n_distinct | 8 most_common_vals | {8,7,6,5,4,3,2,1} most_common_freqs | {0.22222222,0.19444445,0.16666667,0.1388889,0.11111111,0.083333336,0.055555556,0.027777778} histogram_bounds | correlation | 1 most_common_elems | most_common_elem_freqs | elem_count_histogram |
Note:
这里我创建了一个表(单列),列上有8个不同的值,分别是11个1,22个2…88个8. 列上的freq 柱状图显示,8这个值88/396刚好就是0.2222, 7值是77/396约0.194444, 以此类推。
plan_cache_mode AUTO
查看当前的plan_cache_mode值
anbob=# select * from pg_settings where name = 'plan_cache_mode'; -[ RECORD 1 ]---+------------------------------------------------------------------------------------------------------------------- name | plan_cache_mode setting | auto unit | category | Query Tuning / Other Planner Options short_desc | Controls the planner's selection of custom or generic plan. extra_desc | Prepared statements can have custom and generic plans, and the planner will attempt to choose which is better. This can be set to override the default behavior. context | user vartype | enum source | default min_val | max_val | enumvals | {auto,force_generic_plan,force_custom_plan} boot_val | auto reset_val | auto sourcefile | sourceline | pending_restart | f
Note:
当前plan_cache_mode是AUTO.
anbob=# prepare pstat1 (int) as select count(*) from T_ANBOB where n=$1; PREPARE anbob=# explain (analyze) execute pstat1(1); QUERY PLAN --------------------------------------------------------------------------------------------------------- Aggregate (cost=6.98..6.99 rows=1 width=8) (actual time=0.043..0.043 rows=1 loops=1) -> Seq Scan on t_anbob (cost=0.00..6.95 rows=11 width=0) (actual time=0.011..0.038 rows=11 loops=1) Filter: (n = 1) Rows Removed by Filter: 385 Planning Time: 0.319 ms Execution Time: 0.079 ms (6 rows) anbob=# explain (analyze) execute pstat1(2); QUERY PLAN --------------------------------------------------------------------------------------------------------- Aggregate (cost=7.00..7.01 rows=1 width=8) (actual time=0.212..0.212 rows=1 loops=1) -> Seq Scan on t_anbob (cost=0.00..6.95 rows=22 width=0) (actual time=0.178..0.205 rows=22 loops=1) Filter: (n = 2) Rows Removed by Filter: 374 Planning Time: 0.052 ms Execution Time: 0.228 ms (6 rows) anbob=# explain (analyze) execute pstat1(3); QUERY PLAN --------------------------------------------------------------------------------------------------------- Aggregate (cost=7.03..7.04 rows=1 width=8) (actual time=0.043..0.043 rows=1 loops=1) -> Seq Scan on t_anbob (cost=0.00..6.95 rows=33 width=0) (actual time=0.010..0.037 rows=33 loops=1) Filter: (n = 3) Rows Removed by Filter: 363 Planning Time: 0.047 ms Execution Time: 0.058 ms (6 rows) anbob=# explain (analyze) execute pstat1(4); ... anbob=# explain (analyze) execute pstat1(5); QUERY PLAN --------------------------------------------------------------------------------------------------------- Aggregate (cost=7.09..7.10 rows=1 width=8) (actual time=0.242..0.243 rows=1 loops=1) -> Seq Scan on t_anbob (cost=0.00..6.95 rows=55 width=0) (actual time=0.044..0.212 rows=55 loops=1) Filter: (n = 5) Rows Removed by Filter: 341 Planning Time: 0.150 ms Execution Time: 0.290 ms (6 rows) anbob=# explain (analyze) execute pstat1(6); QUERY PLAN --------------------------------------------------------------------------------------------------------- Aggregate (cost=7.08..7.08 rows=1 width=8) (actual time=0.151..0.152 rows=1 loops=1) -> Seq Scan on t_anbob (cost=0.00..6.95 rows=50 width=0) (actual time=0.059..0.125 rows=66 loops=1) Filter: (n = $1) Rows Removed by Filter: 330 Planning Time: 0.240 ms Execution Time: 0.205 ms (6 rows) anbob=# explain (analyze) execute pstat1(7); QUERY PLAN --------------------------------------------------------------------------------------------------------- Aggregate (cost=7.08..7.08 rows=1 width=8) (actual time=0.161..0.161 rows=1 loops=1) -> Seq Scan on t_anbob (cost=0.00..6.95 rows=50 width=0) (actual time=0.084..0.131 rows=77 loops=1) Filter: (n = $1) Rows Removed by Filter: 319 Planning Time: 0.025 ms Execution Time: 0.214 ms (6 rows) anbob=# explain (analyze) execute pstat1(10); QUERY PLAN -------------------------------------------------------------------------------------------------------- Aggregate (cost=7.08..7.08 rows=1 width=8) (actual time=0.047..0.047 rows=1 loops=1) -> Seq Scan on t_anbob (cost=0.00..6.95 rows=50 width=0) (actual time=0.045..0.045 rows=0 loops=1) Filter: (n = $1) Rows Removed by Filter: 396 Planning Time: 0.014 ms Execution Time: 0.072 ms (6 rows) anbob=# select count(n),count(distinct n), count(n)::numeric/count(distinct n) "avg rows per value" from T_ANBOB; count | count | avg rows per value -------+-------+--------------------- 396 | 8 | 49.5000000000000000 (1 row)
Note:
在auto默认情况下,前5次是costom plan 可以正确的估算返回行数,从第6次开始变成了generic plan, 并且是按平均值估算的,也未绑定变量窥探,对于值10不存在,同样也是估算的50. Oracle 在这方面是相反的,先generi plan, 如果有特殊值或数据倾斜时,如果需要再独立生成costom plan.
force_generic_plan
然后把plan_cache_mode改为force_generic_plan. 这有点像ORACLE中的CURSOR_SHARING=force; 同是还是 “ _optim_peek_user_binds”=false 或未收集列柱状图那样。
anbob=# prepare pstat1 (int) as select count(*) from T_ANBOB where n=$1; PREPARE anbob=# set plan_cache_mode=force_generic_plan; SET anbob=# explain (analyze) execute pstat1(1); QUERY PLAN --------------------------------------------------------------------------------------------------------- Aggregate (cost=7.08..7.08 rows=1 width=8) (actual time=0.053..0.054 rows=1 loops=1) -> Seq Scan on t_anbob (cost=0.00..6.95 rows=50 width=0) (actual time=0.014..0.049 rows=11 loops=1) Filter: (n = $1) Rows Removed by Filter: 385 Planning Time: 0.071 ms Execution Time: 0.073 ms (6 rows) anbob=# explain (analyze) execute pstat1(9); QUERY PLAN -------------------------------------------------------------------------------------------------------- Aggregate (cost=7.08..7.08 rows=1 width=8) (actual time=0.112..0.112 rows=1 loops=1) -> Seq Scan on t_anbob (cost=0.00..6.95 rows=50 width=0) (actual time=0.108..0.108 rows=0 loops=1) Filter: (n = $1) Rows Removed by Filter: 396 Planning Time: 0.013 ms Execution Time: 0.137 ms (6 rows)
force_generic_plan后,不再像auto 一样执行几次,它是可以立即重用plan, 但是行数是估算的平均值。对于不存在的值,同样估算是错误的。
force_custom_plan
anbob=# deallocate prepare pstat1 ; DEALLOCATE anbob=# set plan_cache_mode=force_custom_plan; SET anbob=# explain (analyze) execute pstat1(1); QUERY PLAN --------------------------------------------------------------------------------------------------------- Aggregate (cost=6.98..6.99 rows=1 width=8) (actual time=0.104..0.105 rows=1 loops=1) -> Seq Scan on t_anbob (cost=0.00..6.95 rows=11 width=0) (actual time=0.026..0.094 rows=11 loops=1) Filter: (n = 1) Rows Removed by Filter: 385 Planning Time: 0.148 ms Execution Time: 0.148 ms (6 rows) anbob=# explain (analyze) execute pstat1(2); anbob=# explain (analyze) execute pstat1(3); anbob=# explain (analyze) execute pstat1(4); anbob=# explain (analyze) execute pstat1(5); anbob=# explain (analyze) execute pstat1(6); anbob=# explain (analyze) execute pstat1(7); anbob=# explain (analyze) execute pstat1(8); QUERY PLAN --------------------------------------------------------------------------------------------------------- Aggregate (cost=7.17..7.18 rows=1 width=8) (actual time=0.149..0.149 rows=1 loops=1) -> Seq Scan on t_anbob (cost=0.00..6.95 rows=88 width=0) (actual time=0.087..0.118 rows=88 loops=1) Filter: (n = 8) Rows Removed by Filter: 308 Planning Time: 0.752 ms Execution Time: 0.200 ms (6 rows) anbob=# explain (analyze) execute pstat1(9); QUERY PLAN ------------------------------------------------------------------------------------------------------- Aggregate (cost=6.95..6.96 rows=1 width=8) (actual time=0.041..0.041 rows=1 loops=1) -> Seq Scan on t_anbob (cost=0.00..6.95 rows=1 width=0) (actual time=0.039..0.039 rows=0 loops=1) Filter: (n = 9) Rows Removed by Filter: 396 Planning Time: 0.073 ms Execution Time: 0.060 ms (6 rows)
Note:
force_custom_plan 永远使用custom plan,无论执行多少次, 有点像literal SQL, 但是又使用绑定变量,至少可以防止SQL注入,在ORACLE中没有发现相同的特性。可以用于数据极度不均衡,执行计划不准的场景。
Conclusion:
SQL 共享没有绝对的好坏,ORA, DB2 MSSQL商用数据库都可以多SESSION共享,但是MySQL ,PG 又都不可以跨SESSION共享, PG session级可以共享. 共享需要更多的内存和latch,不共享每次解析需要更多的cpu. 但是开源数据库因为软件场景基本都可控,更好的扩展性CPU不再是问题。 共享会存在更多的Library Cache争用影响并发。不共享可能在解决多表关连时会更消耗资源。但是对于oracle的共享相信也有遇到过shared pool碎片问题出现的ORA-4031等其它执行计划不稳定现象, 所有需要架构师和DBA了解开源与商用数据库的差别,更好的设计应用。
Reference Sergey Paramonov and @FranckPachot ‘s Articles
对不起,这篇文章暂时关闭评论。