首页 » ORACLE 9i-23ai, PostgreSQL/GaussDB » Alert : PostgreSQL inline Subquery or View 包含volatile functions 阻止谓词推进(Predicate PushDown)

Alert : PostgreSQL inline Subquery or View 包含volatile functions 阻止谓词推进(Predicate PushDown)

Oracle、Oceanbase、Kingbase、GaussDB、达梦数据库比较系列(二十七):子查询中的函数投影裁剪》 测试在inline subquery中包含函数列投影时裁剪或叫SLP(select list pruning) ,如果函数的Volate属性是volatile的影响,函数的不稳定性除了影响投影还有join 的view包含该函数时,影响谓词条件的推入等,最近在highgoDB遇到了一个SQL性能问题,其实所有pg系数据库如gaussdb,opengauss,kingbase等都存在。下面演示一下这个问题。
构建测试数据

-- pg
create table t1 as
 select x as id, 'anbob'||x name,'03-07-2023'::timestamptz +(b::integer||' minute')::interval  ctime 
from generate_series(1,10000) as x(b);

create index idx_t1_ctime on t1(ctime);

create table t2(id int,ctime date,name text);

insert into t2 values(1,now(),'anbob');
insert into t2 values(2,now()+'1 day'::interval,'weejar');
insert into t2 values(3,'03-07-2023 01:00:00'::timestamptz,'anbob');


CREATE OR REPLACE FUNCTION f2(id integer)
 returns bigint
 language 'plpgsql'
 as
 $$
 begin
 return case when id>10 then 1 else 2 end;
 end;
 $$ ;

create or replace view v1 
as
select id,name,ctime,f2(id) flag 
from t1 where name<>'anbob100';

select * from t2 , v1 
where t2.id=t2.id and t2.ctime >= to_date('2023-03-07 00:00:00', 'yyyy-mm-dd hh24:mi:ss') 
 AND t2.ctime < to_date('2023-03-08', 'yyyy-mm-dd hh24:mi:ss') 
and v1.ctime>t2.ctime-('100 minute')::interval
and v1.ctime<t2.ctime
and (t2.name <> 'anbob1' or t2.name<>'weejar1');

-- oracle
create table t1 tablespace users as
 select rownum id, 'anbob'||rownum name, to_date('03-07-2023','mm-dd-yyyy')+rownum/24/60   ctime 
from dual connect by rownum<=10000; 

create index idx_t1_ctime on t1(ctime);
create table t2(id int,ctime date,name varchar2(100)) tablespace users;
insert into t2 values(1,sysdate,'anbob'); 
insert into t2 values(2,sysdate+1,'weejar'); 
insert into t2 values(3,to_date('03-07-2023 01:00:00','mm-dd-yyyy hh24:mi:ss'),'anbob'); 

CREATE OR REPLACE FUNCTION f2(id integer) 
return int 
is 
begin 
  return case when id>10 then 1 else 2 ;
end;

create or replace view v1 
as
select id,name,ctime,f2(id) flag 
from t1 
where name<>'anbob100';

Note:
创建了两张表和一个view, 其中view中有一列是函数,注意这里的函数我仅为了演示使用的是常量,可以改稳定性,不排除如果是不能改稳定的函数时,这类问题就需要改SQL了。

先看oracle的执行计划

select * from t2 , v1 
where t2.id=t2.id and t2.ctime >= to_date('2023-03-07 00:00:00', 'yyyy-mm-dd hh24:mi:ss')                                                                                                                                          
             AND t2.ctime < to_date('2023-03-08', 'yyyy-mm-dd hh24:mi:ss') and v1.ctime>t2.ctime-100/24/60
and v1.ctime>t2.ctime- 100 /24/60
and v1.ctime<t2.ctime
and (t2.name <> 'anbob1' or t2.name<>'weejar1');

Plan hash value: 1917871493

----------------------------------------------------------------------------------------------
| Id  | Operation                     | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |              |    28 |  1092 |     6   (0)| 00:00:01 |
|*  1 |  FILTER                       |              |       |       |            |          |
|   2 |   NESTED LOOPS                |              |    28 |  1092 |     6   (0)| 00:00:01 |
|   3 |    NESTED LOOPS               |              |   260 |  1092 |     6   (0)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL         | T2           |     1 |    17 |     3   (0)| 00:00:01 |
|*  5 |     INDEX RANGE SCAN          | IDX_T1_CTIME |   260 |       |     2   (0)| 00:00:01 |
|*  6 |    TABLE ACCESS BY INDEX ROWID| T1           |    37 |   814 |     3   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter(TO_DATE('2023-03-08','yyyy-mm-dd hh24:mi:ss')>TO_DATE('2023-03-07 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   4 - filter(("T2"."NAME"<>'anbob1' OR "T2"."NAME"<>'weejar1') AND
              "T2"."CTIME"<TO_DATE('2023-03-08','yyyy-mm-dd hh24:mi:ss') AND "T2"."ID" IS NOT NULL AND "T2"."CTIME">=TO_DATE(' 2023-03-07 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              INTERNAL_FUNCTION("T2"."CTIME")-.0694444444444444444444444444444444444445<TO_DATE('2023-03-08','yyyy-mm-dd hh24:mi:ss')) 
   5 - access("CTIME">INTERNAL_FUNCTION("T2"."CTIME")-.0694444444444444444444444444444
              444444445 AND "CTIME"<TO_DATE('2023-03-08','yyyy-mm-dd hh24:mi:ss'))
       filter("CTIME"<"T2"."CTIME")
   6 - filter("NAME"<>'anbob100')

Note:
#5 显示谓词已推到view v1,使用了ctime上的索引。

hint禁用oracle的特性观察一下

SQL> select /*+no_merge(v1)*/* from t2 , v1
where t2.id=t2.id and t2.ctime >= to_date('2023-03-07 00:00:00', 'yyyy-mm-dd hh24:mi:ss')
             AND t2.ctime < to_date('2023-03-08', 'yyyy-mm-dd hh24:mi:ss') 4 and v1.ctime>t2.ctime-100/24/60
and v1.ctime>t2.ctime- 100 /24/60
and v1.ctime<t2.ctime
and (t2.name <> 'anbob1' or t2.name<>'weejar1');

SQL> @x2

PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------------  
Plan hash value: 2510977736

-------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |              |    28 |  2128 |     7   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                          |              |    28 |  2128 |     7   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                    | T2           |     1 |    17 |     3   (0)| 00:00:01 |
|   3 |   VIEW PUSHED PREDICATE                | V1           |     1 |    59 |     4   (0)| 00:00:01 |
|*  4 |    FILTER                              |              |       |       |            |          |
|*  5 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1           |     4 |    88 |     4   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | IDX_T1_CTIME |    25 |       |     3   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("T2"."NAME"<>'anbob1' OR "T2"."NAME"<>'weejar1') AND
              "T2"."CTIME"<TO_DATE('2023-03-08','yyyy-mm-dd hh24:mi:ss') AND "T2"."ID" IS NOT NULL AND "T2"."CTIME">=TO_DATE(' 2023-03-07 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   4 - filter("T2"."CTIME">INTERNAL_FUNCTION("T2"."CTIME")-.06944444444444444444444444444444444
              44445 AND INTERNAL_FUNCTION("T2"."CTIME")-.0694444444444444444444444444444444444445<TO_DATE('20 23-03-08','yyyy-mm-dd hh24:mi:ss') AND TO_DATE('2023-03-08','yyyy-mm-dd hh24:mi:ss')>TO_DATE('
              2023-03-07 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
   5 - filter("NAME"<>'anbob100')
   6 - access("CTIME">INTERNAL_FUNCTION("T2"."CTIME")-.0694444444444444444444444444444444444445
               AND "CTIME"<TO_DATE('2023-03-08','yyyy-mm-dd hh24:mi:ss'))
       filter("CTIME"<"T2"."CTIME") select /*+no_merge(v1) NO_PUSH_PRED(v1)*/* from t2 , v1 where t2.id=t2.id and t2.ctime >= to_date('2023-03-07 00:00:00', 'yyyy-mm-dd hh24:mi:ss')                                                                                                                                          
             AND t2.ctime < to_date('2023-03-08', 'yyyy-mm-dd hh24:mi:ss') and v1.ctime>t2.ctime-100/24/60
and v1.ctime 'anbob1' or t2.name<>'weejar1');

SQL> @x2
PLAN_TABLE_OUTPUT
-----------------------------------------------------------------------------------------------------------------------------  
Plan hash value: 1639856690

-------------------------------------------------------------------------------------------------------
| Id  | Operation                              | Name         | Rows  | Bytes | Cost (%CPU)| Time     |
-------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                       |              |    28 |  2128 |    14   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                          |              |    28 |  2128 |    14   (0)| 00:00:01 |
|*  2 |   TABLE ACCESS FULL                    | T2           |     1 |    17 |     3   (0)| 00:00:01 |
|*  3 |   VIEW                                 | V1           |    37 |  2183 |    11   (0)| 00:00:01 |
|*  4 |    FILTER                              |              |       |       |            |          |
|*  5 |     TABLE ACCESS BY INDEX ROWID BATCHED| T1           |  1439 | 31658 |    11   (0)| 00:00:01 |
|*  6 |      INDEX RANGE SCAN                  | IDX_T1_CTIME |  1439 |       |     5   (0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - filter(("T2"."NAME"<>'anbob1' OR "T2"."NAME"<>'weejar1') AND
              "T2"."CTIME"<TO_DATE('2023-03-08','yyyy-mm-dd hh24:mi:ss') AND "T2"."ID" IS NOT NULL AND "T2"."CTIME">=TO_DATE(' 2023-03-07 00:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
              INTERNAL_FUNCTION("T2"."CTIME")-.0694444444444444444444444444444444444445<TO_DATE('2023-03-08',
              'yyyy-mm-dd hh24:mi:ss'))
   3 - filter("V1"."CTIME"<"T2"."CTIME" AND "V1"."CTIME">INTERNAL_FUNCTION("T2"."CTIME")-.06944
              44444444444444444444444444444444445)
   4 - filter(TO_DATE('2023-03-08','yyyy-mm-dd hh24:mi:ss')>TO_DATE(' 2023-03-07 00:00:00',
              'syyyy-mm-dd hh24:mi:ss'))
   5 - filter("NAME"<>'anbob100')
   6 - access("CTIME"<TO_DATE('2023-03-08','yyyy-mm-dd hh24:mi:ss'))

28 rows selected.

达梦

explain 
select * from t2 , v1 
where t2.id=t2.id and t2.ctime >= to_date('2023-03-07 00:00:00', 'yyyy-mm-dd hh24:mi:ss')                                                                                                                                          
             AND t2.ctime < to_date('2023-03-08', 'yyyy-mm-dd hh24:mi:ss') and v1.ctime>t2.ctime-100/24/60
and v1.ctime and v1.ctime>t2.ctime- 100 /24/60
and v1.ctime<t2.ctime
and (t2.name <> 'anbob1' or t2.name<>'weejar1');

1   #NSET2: [1, 375, 134]
2     #PRJT2: [1, 375, 134]; exp_num(7), is_atom(FALSE)
3       #SLCT2: [1, 375, 134]; T1.NAME <> 'anbob100'
4         #NEST LOOP INDEX JOIN2: [1, 375, 134]
5           #SLCT2: [1, 1, 65]; (NOT(T2.NAME IS NULL) AND NOT(T2.ID IS NULL) AND T2.CTIME >= var2 AND T2.CTIME < var4)
6             #CSCN2: [1, 3, 65]; INDEX33708464(T2); btr_scan(1)
7           #BLKUP2: [1, 375, 13]; IDX_T1_CTIME(T1)
8             #SSEK2: [1, 375, 13]; scan_type(ASC), IDX_T1_CTIME(T1), scan_range(exp_cast(T2.CTIME-exp_cast(100/24/60)),exp_cast(T2.CTIME))

used time: 30.703(ms). Execute id is 0.

Highgo V9(pg14)

SET track_io_timing = TRUE;
RESET enable_hashjoin;
RESET enable_sort;

explain (analyze,verbose)
select * from t2 , v1 
where t2.id=t2.id and t2.ctime >= to_date('2023-03-07 00:00:00', 'yyyy-mm-dd hh24:mi:ss')                                                                                                                                          
             AND t2.ctime < to_date('2023-03-08', 'yyyy-mm-dd hh24:mi:ss') and v1.ctime>t2.ctime-('100 minute')::interval
and v1.ctime >t2.ctime-('100 minute')::interval
and v1.ctime<t2.ctime
and (t2.name <> 'anbob1'  or t2.name<>'weejar1');

                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..2964.80 rows=1111 width=43) (actual time=12.208..12.209 rows=0 loops=1)
   Output: t2.id, t2.ctime, t2.name, t1.id, t1.name, t1.ctime, (f2(t1.id))
   Join Filter: ((t1.ctime < t2.ctime) AND (t1.ctime > (t2.ctime - '01:40:00'::interval)))
   Rows Removed by Join Filter: 9999
   ->  Seq Scan on public.t2  (cost=0.00..1.07 rows=1 width=14) (actual time=0.021..0.021 rows=1 loops=1)
         Output: t2.id, t2.ctime, t2.name
         Filter: ((t2.id IS NOT NULL) AND (t2.ctime >= to_date('2023-03-07 00:00:00'::text, 'yyyy-mm-dd hh24:mi:ss'::text)) AND (t2.ctime < to_date('2023-03-08'::text, 'yyyy-mm-dd hh24:mi:ss'::text)) AND ((t
2.name <> 'anbob1'::text) OR (t2.name <> 'weejar1'::text)))
         Rows Removed by Filter: 2
   ->  Seq Scan on public.t1  (cost=0.00..2688.75 rows=9999 width=29) (actual time=0.038..10.733 rows=9999 loops=1)
         Output: t1.id, t1.name, t1.ctime, f2(t1.id)
         Filter: (t1.name <> 'anbob100'::text)
         Rows Removed by Filter: 1
 Planning Time: 0.186 ms
 Execution Time: 12.288 ms
(14 rows)

Note:
这里t1表使用的全表扫seq scan. 使用了nested loop join最外层 join里过滤时间, 而t1表的大小就决定了SQL的运行时间,在生产上刚好我们的表是个大分区表有几亿条记录,运行时间要等十几分钟。开始以为是function cost影响,修改后function cost并不会影响v1的访问路径。

修改函数volatile到immutable

alter function f2 immutable;

SET track_io_timing = TRUE;
explain (analyze,verbose)
select * from t2 , v1 
where t2.id=t2.id and t2.ctime >= to_date('2023-03-07 00:00:00', 'yyyy-mm-dd hh24:mi:ss')                                                                                                                                          
             AND t2.ctime < to_date('2023-03-08', 'yyyy-mm-dd hh24:mi:ss') and v1.ctime>t2.ctime-('100 minute')::interval
and v1.ctime>t2.ctime-('100 minute')::interval
and v1.ctime<t2.ctime
and (t2.name <> 'anbob1' or t2.name<>'weejar1');


                                                QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.29..342.22 rows=1111 width=43) (actual time=0.055..0.056 rows=0 loops=1)
   Output: t2.id, t2.ctime, t2.name, t1.id, t1.name, t1.ctime, f2(t1.id)
   ->  Seq Scan on public.t2  (cost=0.00..1.07 rows=1 width=14) (actual time=0.024..0.025 rows=1 loops=1)
         Output: t2.id, t2.ctime, t2.name
         Filter: ((t2.id IS NOT NULL) AND (t2.ctime >= to_date('2023-03-07 00:00:00'::text, 'yyyy-mm-dd hh24:mi:ss'::text)) AND (t2.ctime < to_date('2023-03-08'::text, 'yyyy-mm-dd hh24:mi:ss'::text)) AND ((t
2.name <> 'anbob1'::text) OR (t2.name <> 'weejar1'::text)))
         Rows Removed by Filter: 2
   ->  Index Scan using idx_t1_ctime on public.t1  (cost=0.29..52.29 rows=1111 width=21) (actual time=0.026..0.026 rows=0 loops=1)
         Output: t1.id, t1.name, t1.ctime
         Index Cond: ((t1.ctime > (t2.ctime - '01:40:00'::interval)) AND (t1.ctime < t2.ctime))
         Filter: (t1.name <> 'anbob100'::text)
 Planning Time: 0.252 ms
 Execution Time: 0.102 ms
(12 rows)

Note:
这里ctime条件pushdown到了v1 view中,在访问T1时,使用上了索引,并且也做了投影裁剪,执行效率明显提升,而生产环境上的案例SQL直接从10几分钟提到秒级。 这里索引访问显示有Index Cond和Filter两个算子,目标是一样的,但是实现方式完全不同。Index Cond 用于从index索引中查找行位置的条件,Postgres 使用索引的结构化特性快速跳转到它要查找的行。不同的索引类型使用不同的策略。而在 “Filter” 中,根据row行的值检索并丢弃行。因此,您可以在同一操作中找到 “Index Cond” 和 “Filter”。 索引改为volatile,执行计划同immutable.

执行计划美化后,看起来更直观

不能推进的原因
当开源数据库找不到相关官方文档,查找源代码是个不错的方向 https://github.com/postgres/postgres/blob/master/src/backend/optimizer/path/allpaths.c
查找相关set_subquery_pathlist 里面的 subquery_is_pushdown_safe() 和 qual_is_pushdown_safe()函数.和contain_volatile_functions()

/* results of subquery_is_pushdown_safe */
typedef struct pushdown_safety_info
{
	unsigned char *unsafeFlags; /* bitmask of reasons why this target list
								 * column is unsafe for qual pushdown, or 0 if
								 * no reason. */
	bool		unsafeVolatile; /* don't push down volatile quals */
	bool		unsafeLeaky;	/* don't push down leaky quals */
} pushdown_safety_info;



/*
 * check_output_expressions - check subquery's output expressions for safety
 *
 * There are several cases in which it's unsafe to push down an upper-level
 * qual if it references a particular output column of a subquery.  We check
 * each output column of the subquery and set flags in unsafeFlags[k] when we
 * see that column is unsafe for a pushed-down qual to reference.  The
 * conditions checked here are:
 *
 * 1. We must not push down any quals that refer to subselect outputs that
 * return sets, else we'd introduce functions-returning-sets into the
 * subquery's WHERE/HAVING quals.
 *
 * 2. We must not push down any quals that refer to subselect outputs that
 * contain volatile functions, for fear of introducing strange results due
 * to multiple evaluation of a volatile function.
 *
 * 3. If the subquery uses DISTINCT ON, we must not push down any quals that
 * refer to non-DISTINCT output columns, because that could change the set
 * of rows returned.  (This condition is vacuous for DISTINCT, because then
 * there are no non-DISTINCT output columns, so we needn't check.  Note that
 * subquery_is_pushdown_safe already reported that we can't use volatile
 * quals if there's DISTINCT or DISTINCT ON.)
 *
 * 4. If the subquery has any window functions, we must not push down quals
 * that reference any output columns that are not listed in all the subquery's
 * window PARTITION BY clauses.  We can push down quals that use only
 * partitioning columns because they should succeed or fail identically for
 * every row of any one window partition, and totally excluding some
 * partitions will not change a window function's results for remaining
 * partitions.  (Again, this also requires nonvolatile quals, but
 * subquery_is_pushdown_safe handles that.).  Subquery columns marked as
 * unsafe for this reason can still have WindowClause run conditions pushed
 * down.
 */


/*
 * set_subquery_pathlist
 *		Generate SubqueryScan access paths for a subquery RTE
 *
 * We don't currently support generating parameterized paths for subqueries
 * by pushing join clauses down into them; it seems too expensive to re-plan
 * the subquery multiple times to consider different alternatives.
 * (XXX that could stand to be reconsidered, now that we use Paths.)
 * So the paths made here will be parameterized if the subquery contains
 * LATERAL references, otherwise not.  As long as that's true, there's no need
 * for a separate set_subquery_size phase: just make the paths right away.
 */


/*****************************************************************************
 *			PUSHING QUALS DOWN INTO SUBQUERIES
 *****************************************************************************/

/*
 * subquery_is_pushdown_safe - is a subquery safe for pushing down quals?
 *
 * subquery is the particular component query being checked.  topquery
 * is the top component of a set-operations tree (the same Query if no
 * set-op is involved).
 *
 * Conditions checked here:
 *
 * 1. If the subquery has a LIMIT clause, we must not push down any quals,
 * since that could change the set of rows returned.
 *
 * 2. If the subquery contains EXCEPT or EXCEPT ALL set ops we cannot push
 * quals into it, because that could change the results.
 *
 * 3. If the subquery uses DISTINCT, we cannot push volatile quals into it.
 * This is because upper-level quals should semantically be evaluated only
 * once per distinct row, not once per original row, and if the qual is
 * volatile then extra evaluations could change the results.  (This issue
 * does not apply to other forms of aggregation such as GROUP BY, because
 * when those are present we push into HAVING not WHERE, so that the quals
 * are still applied after aggregation.)
 *
 * 4. If the subquery contains window functions, we cannot push volatile quals
 * into it.  The issue here is a bit different from DISTINCT: a volatile qual
 * might succeed for some rows of a window partition and fail for others,
 * thereby changing the partition contents and thus the window functions'
 * results for rows that remain.
 *
 * 5. If the subquery contains any set-returning functions in its targetlist,
 * we cannot push volatile quals into it.  That would push them below the SRFs
 * and thereby change the number of times they are evaluated.  Also, a
 * volatile qual could succeed for some SRF output rows and fail for others,
 * a behavior that cannot occur if it's evaluated before SRF expansion.
 *
 * 6. If the subquery has nonempty grouping sets, we cannot push down any
 * quals.  The concern here is that a qual referencing a "constant" grouping
 * column could get constant-folded, which would be improper because the value
 * is potentially nullable by grouping-set expansion.  This restriction could
 * be removed if we had a parsetree representation that shows that such
 * grouping columns are not really constant.  (There are other ideas that
 * could be used to relax this restriction, but that's the approach most
 * likely to get taken in the future.  Note that there's not much to be gained
 * so long as subquery_planner can't move HAVING clauses to WHERE within such
 * a subquery.)
 *
 * In addition, we make several checks on the subquery's output columns to see
 * if it is safe to reference them in pushed-down quals.  If output column k
 * is found to be unsafe to reference, we set the reason for that inside
 * safetyInfo->unsafeFlags[k], but we don't reject the subquery overall since
 * column k might not be referenced by some/all quals.  The unsafeFlags[]
 * array will be consulted later by qual_is_pushdown_safe().  It's better to
 * do it this way than to make the checks directly in qual_is_pushdown_safe(),
 * because when the subquery involves set operations we have to check the
 * output expressions in each arm of the set op.
 *
 * Note: pushing quals into a DISTINCT subquery is theoretically dubious:
 * we're effectively assuming that the quals cannot distinguish values that
 * the DISTINCT's equality operator sees as equal, yet there are many
 * counterexamples to that assumption.  However use of such a qual with a
 * DISTINCT subquery would be unsafe anyway, since there's no guarantee which
 * "equal" value will be chosen as the output value by the DISTINCT operation.
 * So we don't worry too much about that.  Another objection is that if the
 * qual is expensive to evaluate, running it for each original row might cost
 * more than we save by eliminating rows before the DISTINCT step.  But it
 * would be very hard to estimate that at this stage, and in practice pushdown
 * seldom seems to make things worse, so we ignore that problem too.
 *
 * Note: likewise, pushing quals into a subquery with window functions is a
 * bit dubious: the quals might remove some rows of a window partition while
 * leaving others, causing changes in the window functions' results for the
 * surviving rows.  We insist that such a qual reference only partitioning
 * columns, but again that only protects us if the qual does not distinguish
 * values that the partitioning equality operator sees as equal.  The risks
 * here are perhaps larger than for DISTINCT, since no de-duplication of rows
 * occurs and thus there is no theoretical problem with such a qual.  But
 * we'll do this anyway because the potential performance benefits are very
 * large, and we've seen no field complaints about the longstanding comparable
 * behavior with DISTINCT.
 */

/*
	 * If there are any restriction clauses that have been attached to the
	 * subquery relation, consider pushing them down to become WHERE or HAVING
	 * quals of the subquery itself.  This transformation is useful because it
	 * may allow us to generate a better plan for the subquery than evaluating
	 * all the subquery output rows and then filtering them.
	 *
	 * There are several cases where we cannot push down clauses. Restrictions
	 * involving the subquery are checked by subquery_is_pushdown_safe().
	 * Restrictions on individual clauses are checked by
	 * qual_is_pushdown_safe().  Also, we don't want to push down
	 * pseudoconstant clauses; better to have the gating node above the
	 * subquery.
	 *
	 * Non-pushed-down clauses will get evaluated as qpquals of the
	 * SubqueryScan node.
	 *
	 * XXX Are there any cases where we want to make a policy decision not to
	 * push down a pushable qual, because it'd result in a worse plan?
	 */


/*
 * qual_is_pushdown_safe - is a particular rinfo safe to push down?
 *
 * rinfo is a restriction clause applying to the given subquery (whose RTE
 * has index rti in the parent query).
 *
 * Conditions checked here:
 *
 * 1. rinfo's clause must not contain any SubPlans (mainly because it's
 * unclear that it will work correctly: SubLinks will already have been
 * transformed into SubPlans in the qual, but not in the subquery).  Note that
 * SubLinks that transform to initplans are safe, and will be accepted here
 * because what we'll see in the qual is just a Param referencing the initplan
 * output.
 *
 * 2. If unsafeVolatile is set, rinfo's clause must not contain any volatile
 * functions.
 *
 * 3. If unsafeLeaky is set, rinfo's clause must not contain any leaky
 * functions that are passed Var nodes, and therefore might reveal values from
 * the subquery as side effects.
 *
 * 4. rinfo's clause must not refer to the whole-row output of the subquery
 * (since there is no easy way to name that within the subquery itself).
 *
 * 5. rinfo's clause must not refer to any subquery output columns that were
 * found to be unsafe to reference by subquery_is_pushdown_safe().
 */


/*
	 * Don't pull up a subquery that has any volatile functions in its
	 * targetlist.  Otherwise we might introduce multiple evaluations of these
	 * functions, if they get copied to multiple places in the upper query,
	 * leading to surprising results.  (Note: the PlaceHolderVar mechanism
	 * doesn't quite guarantee single evaluation; else we could pull up anyway
	 * and just wrap such items in PlaceHolderVars ...)
*/
	if (contain_volatile_functions((Node *) subquery->targetList))
		return false;

	return true;						 

Postgresql在处理view中的volatile 函数时,对predicat push-down 处理逻辑:
1, safe安全检查subquery_is_pushdown_safe
如果包含limit, EXCEPT or EXCEPT ALL ,distinct, window-function 集合返回函数,表示volatile 函数可能不安全
2,具体的谓词检查
在qual_is_pushdown_safe函数中会检查具体的谓词是否安全
如果unsafevolatile被标记为true, 谓词中包含volatile函数, 输出列检查check_output_expressions()如果输出列有volitile函数,无法pushdown。

 

小结:

因为volatile函数每次执行调用可能返回不同的结果,可以导致 结果不一致,如果视图中的volatile函数在输出列中,相关的谓词不会被下推。

 

感叹postgresql源代码中的注释写的很清楚,如果把这些训练专业的大模型,那回答Postgresql问题就一个专家。

打赏

,

目前这篇文章还没有评论(Rss)

我要评论