PostgreSQL系(GaussDB、KingBase…) 在 SELECT 和 TRUNCATE之间lock冲突
在从Oracle迁移到PostgreSQL后,SELECT和TRUNCATE操作互相阻塞是一个常见问题,这主要是由于两种数据库的锁机制不同导致的。目前一客户项目在我们迁移到postgresql系的国产数据库生产环境后,时常在一个存储过程中包含truncate又有select 长事务时发生互相堵塞,简单记录.
原因分析
-
锁机制差异:
-
Oracle的TRUNCATE是DDL操作,Select Null 锁,通常不会与SELECT冲突
-
PostgreSQL中TRUNCATE需要获取ACCESS EXCLUSIVE锁,这是最强的锁级别,会阻塞所有其他操作
-
-
MVCC实现差异:
-
PostgreSQL的MVCC实现方式与Oracle不同
-
TRUNCATE在PostgreSQL中实际上是创建新文件并删除旧文件,需要完全独占访问
-
演示
— 这里使用kingbase
-- session 1 test=# create table test(id int); CREATE TABLE test=# insert into test values (1); INSERT 0 1 test=# commit; WARNING: there is no transaction in progress COMMIT test=# begin; BEGIN test=# select * from test; id ---- 1 (1 row) -- session 2 kingbase=# \c test You are now connected to database "test" as userName "system". test=# begin; BEGIN test=# truncate table test; waiting... -- session 3 kingbase=# \c test; You are now connected to database "test" as userName "system". test=# select * from test; waiting...
Note:
看到堵塞session 1 select — session 2 truncate — session select 堵塞链
kingbase=# select datname,pid,xact_start,query_start,wait_event_type,wait_event,state,query from pg_stat_activity; datname | pid | xact_start | query_start | wait_event_type | wait_event | state | query ----------+---------+-------------------------------+-------------------------------+-----------------+---------------------+---------------------+-------------------------------------- ------------------------------------------------------------------- .... kingbase | 2424743 | 2025-04-27 17:08:39.106147+08 | 2025-04-27 17:08:39.106147+08 | | | active | select datname,pid,xact_start,query_s tart,wait_event_type,wait_event,state,query from pg_stat_activity; test | 2380294 | 2025-04-27 17:06:00.351627+08 | 2025-04-27 17:06:07.249354+08 | Client | ClientRead | idle in transaction | select * from test; test | 2381560 | 2025-04-27 17:06:13.855348+08 | 2025-04-27 17:06:15.955823+08 | Lock | relation | active | truncate table test; test | 2423637 | 2025-04-27 17:06:26.795951+08 | 2025-04-27 17:06:26.795951+08 | Lock | relation | active | select * from test; ... (11 rows) kingbase=# select * from pg_locks; locktype | database | relation | page | tuple | virtualxid | transactionid | classid | objid | objsubid | virtualtransaction | pid | mode | granted | fastpath ---------------+----------+----------+------+-------+------------+---------------+---------+-------+----------+--------------------+---------+---------------------+---------+---------- virtualxid | | | | | 8/393 | | | | | 8/393 | 2423637 | ExclusiveLock | t | t virtualxid | | | | | 7/6 | | | | | 7/6 | 2381560 | ExclusiveLock | t | t virtualxid | | | | | 6/126 | | | | | 6/126 | 2380294 | ExclusiveLock | t | t relation | 13591 | 12085 | | | | | | | | 5/834432 | 2424743 | AccessShareLock | t | t virtualxid | | | | | 5/834432 | | | | | 5/834432 | 2424743 | ExclusiveLock | t | t relation | 13590 | 16799 | | | | | | | | 8/393 | 2423637 | AccessShareLock | f | f transactionid | | | | | | 459127 | | | | 7/6 | 2381560 | ExclusiveLock | t | f relation | 13590 | 16799 | | | | | | | | 6/126 | 2380294 | AccessShareLock | t | f relation | 13590 | 16799 | | | | | | | | 7/6 | 2381560 | AccessExclusiveLock | f | f (9 rows)
Note:
注意TRUNCATE 需要AccessExclusiveLock (table) ,而select 需要 AccessShareLock(Table), Truncate的互斥锁排他与很多事务都有锁冲突。可以见文档PostgreSQL Lock Conflicts
我之前的文章有记录过一些<脚本:用于分析PostgreSQL lock trees(堵塞树) 的查询>
SELECT activity.pid, activity.usename, activity.query, blocking.pid AS blocking_id, blocking.query AS blocking_query FROM pg_stat_activity AS activity JOIN pg_stat_activity AS blocking ON blocking.pid = ANY(pg_blocking_pids(activity.pid)); pid | usename | query | blocking_id | blocking_query ---------+---------+----------------------+-------------+---------------------- 2381560 | system | truncate table test; | 2380294 | select * from test; 2423637 | system | select * from test; | 2381560 | truncate table test; kingbase=# select pg_blocking_pids(2380294); pg_blocking_pids ------------------ {} (1 row) kingbase=# select pg_blocking_pids(2381560); pg_blocking_pids ------------------ {2380294} (1 row) kingbase=# select pg_blocking_pids(2423637); pg_blocking_pids ------------------ {2381560} (1 row) SELECT blocked_locks.pid AS blocked_pid, blocked_activity.usename AS blocked_user, blocking_locks.pid AS blocking_pid, blocking_activity.usename AS blocking_user, blocked_activity.query AS blocked_statement, blocking_activity.query AS current_statement_in_blocking_process FROM pg_catalog.pg_locks blocked_locks JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid AND blocking_locks.pid != blocked_locks.pid JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid WHERE NOT blocked_locks.granted; blocked_pid | blocked_user | blocking_pid | blocking_user | blocked_statement | current_statement_in_blocking_process -------------+--------------+--------------+---------------+----------------------+--------------------------------------- 2381560 | system | 2423637 | system | truncate table test; | select * from test; 2381560 | system | 2380294 | system | truncate table test; | select * from test; 2423637 | system | 2380294 | system | select * from test; | select * from test; 2423637 | system | 2381560 | system | select * from test; | truncate table test;
kingbase=# select * from v_locks_monitor; locktype | datname | relation | page | tuple | transactionid | classid | objid | objsubid | lock_conflict ----------+---------+----------+------+-------+---------------+---------+-------+----------+--------------------------------------------------------------------------------------------- ------------------------------------------------------- relation | test | 16799 | | | | | | | Pid: 2381560 + | | | | | | | | | Lock_Granted: false , Mode: AccessExclusiveLock , Username: system , Database: test , Client _Addr: NULL , Client_Port: -1 , Application_Name: ksql+ | | | | | | | | | , Xact_Start: 2025-04-27 17:06:13.855348+08 , Query_Start: 2025-04-27 17:06:15.955823+08 , Xact_Elapse: +000000000 00:17:23.483185000 + | | | | | | | | | -------- + | | | | | | | | | 0Pid: 2380294 + | | | | | | | | | Lock_Granted: true , Mode: AccessShareLock , Username: system , Database: test , Client_Addr : NULL , Client_Port: -1 , Application_Name: ksql + | | | | | | | | | , Xact_Start: 2025-04-27 17:06:00.351627+08 , Query_Start: 2025-04-27 17:06:07.249354+08 , Xact_Elapse: +000000000 00:17:36.986906000 + | | | | | | | | | -------- + | | | | | | | | | 0Pid: 2380294 + | | | | | | | | | Lock_Granted: true , Mode: AccessShareLock , Username: system , Database: test , Client_Addr : NULL , Client_Port: -1 , Application_Name: ksql + | | | | | | | | | , Xact_Start: 2025-04-27 17:06:00.351627+08 , Query_Start: 2025-04-27 17:06:07.249354+08 , Xact_Elapse: +000000000 00:17:36.986906000 + | | | | | | | | | -------- + | | | | | | | | | 1Pid: 2423637 + | | | | | | | | | Lock_Granted: false , Mode: AccessShareLock , Username: system , Database: test , Client_Add r: NULL , Client_Port: -1 , Application_Name: ksql + | | | | | | | | | , Xact_Start: 2025-04-27 17:06:26.795951+08 , Query_Start: 2025-04-27 17:06:26.795951+08 , Xact_Elapse: +000000000 00:17:10.542582000 + | | | | | | | | | -------- + | | | | | | | | | (1 row)
预防措施
在迁移前充分测试锁行为
评估所有TRUNCATE操作的使用场景
培训团队了解PostgreSQL与Oracle在并发控制方面的差异
解决方案
在低峰期执行TRUNCATE操作
使用短事务,避免长事务与TRUNCATE冲突
通过设置较低的 lock_timeout 来优雅地处理 TRUNCATE 操作可能导致的阻塞问题。
DO $$ BEGIN FOR i IN 1..10 LOOP -- 最多尝试10次 BEGIN SET LOCAL lock_timeout = '100ms'; -- 设置很短的超时 TRUNCATE TABLE target_table; EXIT; -- 成功则退出循环 EXCEPTION WHEN OTHERS THEN RAISE NOTICE 'Attempt % failed, retrying...', i; PERFORM pg_sleep(0.1 * i); -- 指数退避 END; END LOOP; END $$; -- 封装函数 CREATE OR REPLACE FUNCTION safe_truncate(table_name text, max_attempts int DEFAULT 5) RETURNS void AS $$ DECLARE attempt int := 1; success boolean := false; BEGIN WHILE attempt <= max_attempts AND NOT success LOOP BEGIN EXECUTE format('SET LOCAL lock_timeout = %L', CASE WHEN attempt <= 2 THEN '50ms' WHEN attempt <= 4 THEN '100ms' ELSE '200ms' END); EXECUTE format('TRUNCATE TABLE %I', table_name); success := true; EXCEPTION WHEN OTHERS THEN RAISE NOTICE 'Truncate attempt %/% failed on %: %', attempt, max_attempts, table_name, SQLERRM; PERFORM pg_sleep(0.2 * attempt); -- 指数退避 attempt := attempt + 1; END; END LOOP; IF NOT success THEN RAISE EXCEPTION 'Failed to truncate % after % attempts', table_name, max_attempts; END IF; END; $$ LANGUAGE plpgsql; -- 使用示例 SELECT safe_truncate('my_large_table');
Note:
也可以如果多次失败,回退到DELETE + VACUUM方案
上一篇:
目前这篇文章还没有评论(Rss)