首页 » ORACLE 9i-23ai » Troubleshooting Oracle 19c hang wait ‘enq: TX – contention’ when RECO distributed Transaction recovery
Troubleshooting Oracle 19c hang wait ‘enq: TX – contention’ when RECO distributed Transaction recovery
Oracle 19c RAC实例大量会话hang,等待事件为’enq: TX – contention’, 做hanganalyze 堵塞进程为RECO, ‘buffer busy waits'<=’enq: TX – contention’ (cycle) RECO与前台进程形成死锁, 分布式事务表dba_2pc_pendings无记录。简单记录处理过程.
db alert log
ARC2 (PID:110273): Archived Log entry 73808 added for T-1.S-22324 ID 0x20b85f1e LAD:1 2023-02-26T15:33:53.503393+08:00 Error 22 trapped in 2PC on transaction 2259.15.2626285. Cleaning up. <<<<<<<<<<< Error stack returned to user: ORA-02054: transaction 2259.15.2626285 in-doubt ORA-00022: invalid session ID; access denied ORA-02063: preceding line from SXXX1 ---(shared public database link) 2023-02-26T15:33:53.521552+08:00 DISTRIB TRAN Txxxx.anbob.com.e18a8b02.2259.15.2626285 is local tran 2259.15.2626285 (hex=8d3.0f.2812ed) insert pending prepared tran, scn=17889696968055 (hex=0x0000104545086d77) 2023-02-26T15:33:54.081770+08:00 DISTRIB TRAN Txxxx.anbob.com.e18a8b02.2259.15.2626285 is local tran 2259.15.2626285 (hex=8d3.0f.2812ed) change pending prepared tran, scn=17889696968055 (hex=0x0000104545086d77) to pending collecting tran, scn=17889696968055 (hex=0x0000104545086d77)
hanganalyze trace
Chains most likely to have caused the hang: [a] Chain 1 Signature: 'buffer busy waits'<='enq: TX - contention' (cycle) Chain 1 Signature Hash: 0x14a7051f Cycles: ------------------------------------------------------------------------------- Chain 1: ------------------------------------------------------------------------------- Oracle session identified by: { instance: 1 (anbob.anbob1) os id: 88106 process id: 4540, oracle@anbob1 session id: 11499 session serial #: 50576 module name: 0 (JDBC Thin ClientTNS V1-V3)TNS V1-V3))-V3)-V3)V3)) } is waiting for 'enq: TX - contention' with wait info: { p1: 'name|mode'=0x54580004 p2: 'usn<<16 | slot'=0x8d3000f p3: 'sequence'=0x2812ed time in wait: 6.424142 sec heur. time in wait: 13.050695 sec timeout after: 1.575858 sec wait id: 2868 blocking: 182 sessions current sql_id: 2609502947 current sql: insert into SP_.....) values( 310, :1 ,:2 ,:3 ,:4 ,:5 ,:6 ) short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+192<-ksliwat()+2192<-kslwaitctx()+200<-kjusuc()+5742<-ksipgetctxia()+1968<-ksqcmi()+2757<-ksqgtlctx()+6620<-ksqgelctx()+838<-ktuGetTxForXid()+324<-ktcwit1()+8023<-ktuGetCommitTimesMain_Int()+1710<-ktuGetCommitTimesMain()+542<-ktuGetCommitTimes()+149<-ktbgcl1()+2645<-ktbgfi()+10962<-kdiins1()+40025<-kdiinsp0()+129<-kauxsin()+2041<-qesltcLoadIndexList()+873<-qerltcNoKdtBufferedInsRowCBK()+361<-qerltcSingleRowLoad( wait history: * time between current wait and wait #1: 0.000060 sec 1. event: 'enq: TX - contention' time waited: 3.998891 sec wait id: 2867 p1: 'name|mode'=0x54580004 p2: 'usn<<16 | slot'=0x8d3000f p3: 'sequence'=0x2812ed * time between wait #1 and #2: 0.000051 sec 2. event: 'enq: TX - contention' time waited: 2.001008 sec wait id: 2866 p1: 'name|mode'=0x54580004 p2: 'usn<<16 | slot'=0x8d3000f p3: 'sequence'=0x2812ed * time between wait #2 and #3: 0.000044 sec 3. event: 'enq: TX - contention' time waited: 0.626499 sec wait id: 2865 p1: 'name|mode'=0x54580004 p2: 'usn<<16 | slot'=0x8d3000f p3: 'sequence'=0x2812ed } and is blocked by => Oracle session identified by: { instance: 1 (anbob.anbob1) os id: 92451 process id: 75, oracle@anbob1 (RECO) session id: 16576 session serial #: 41963 } which is waiting for 'buffer busy waits' with wait info: { p1: 'file#'=0x1b p2: 'block#'=0x2766a p3: 'class#'=0x1 time in wait: 0.670083 sec heur. time in wait: 31 min 21 sec timeout after: never wait id: 49035 blocking: 182 sessions current sql_id: 0 current sql: short stack: ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+200<-__sighandler()<-semtimedop()+10<-skgpwwait()+192<-ksliwat()+2192<-kslwaitctx()+200<-kcbzwb()+2172<-kcbget()+42000<-ktbxchg()+151<-kdifind()+1016<-kdifind4undo()+52<-kdiulk()+15754<-kcoubk()+337<-ktundo()+1816<-kturCurrBackoutOneChg()+1230<-kturAbortInDoubtTxnMain()+803<-kturAbortInDoubtTxn()+952<-ktdabt()+383<-k2lrab()+56<-k2vrec()+800<-k2vdrvMain()+5992<-k2vdrv()+448<-ksb_act_run_int()+117<-ksb_act_run()+130<-ksbabs()+746<-ksbrdp()+1123<-opirip()+541<-opi wait history: * time between current wait and wait #1: 0.000119 sec 1. event: 'buffer busy waits' time waited: 0.999818 sec wait id: 49034 p1: 'file#'=0x1b p2: 'block#'=0x2766a p3: 'class#'=0x1 * time between wait #1 and #2: 0.000181 sec 2. event: 'buffer busy waits' time waited: 0.999882 sec wait id: 49033 p1: 'file#'=0x1b p2: 'block#'=0x2766a p3: 'class#'=0x1 * time between wait #2 and #3: 0.000125 sec 3. event: 'buffer busy waits' time waited: 0.999832 sec wait id: 49032 p1: 'file#'=0x1b p2: 'block#'=0x2766a p3: 'class#'=0x1 } and is blocked by the session at the start of the chain. ---大量会话被11499堵塞 行 4522: and is blocked by 'instance: 1, os id: 88106, session id: 11499', 行 4574: and is blocked by 'instance: 1, os id: 88106, session id: 11499', 行 4727: and is blocked by 'instance: 1, os id: 88106, session id: 11499', 行 4779: and is blocked by 'instance: 1, os id: 88106, session id: 11499', 行 4831: and is blocked by 'instance: 1, os id: 88106, session id: 11499', 行 4987: and is blocked by 'instance: 1, os id: 88106, session id: 11499', 行 5038: and is blocked by 'instance: 1, os id: 88106, session id: 11499', 行 5090: and is blocked by 'instance: 1, os id: 88106, session id: 11499', 行 5138: and is blocked by 'instance: 1, os id: 88106, session id: 11499', 行 5186: and is blocked by 'instance: 1, os id: 88106, session id: 11499', 行 5291: and is blocked by 'instance: 1, os id: 88106, session id: 11499', 行 5395: and is blocked by 'instance: 1, os id: 88106, session id: 11499', 行 5447: and is blocked by 'instance: 1, os id: 88106, session id: 11499', 行 5499: and is blocked by 'instance: 1, os id: 88106, session id: 11499', 行 5549: and is blocked by 'instance: 1, os id: 88106, session id: 11499', 行 5601: and is blocked by 'instance: 1, os id: 88106, session id: 11499', 。。。 Chain 1 Signature: 'buffer busy waits'<='enq: TX - contention' (cycle) [11094]/1/11095/42517/0x9a538e200/91023/NLEAF/[11498] [11498]/1/11499/50576/0xa05321e20/88106/IN_HANG/[16575] ... Oradebug command 'hanganalyze 3' console output: Cycle 1: (1/16576)--(1/11499)
就是通过X$KTUXE中可以定位到dead prepared transaction,但在dba_2pc_pending视图中查不到
SQL> select * from x$ktuxe where KTUXESTA!='INACTIVE' and KTUXECFL like '%DEAD%'; ADDR INDX INST_ID CON_ID KTUXEUSN KTUXESLT KTUXESQN KTUXERDBF KTUXERDBB KTUXESCNB KTUXESCNW KTUXESCN KTUXESTA KTUXECFL KTUXEUEL KTUXEDDBF KTUXEDDBB KTUXEPUSN KTUXEPSLT KTUXEPSQN KTUXESIZ KXIDEXTFLAG ---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------- ------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------- 00007FFFF25B5600 76885 1 0 2259 15 2626285 3 405500 1158180215 4165 1.7890E+13 PREPARED SCO|COL|REV|DEAD 6 3 405500 0 0 0 2 0 SQL> select * from dba_2pc_pending; no rows selected
Note:
判断2pc 远程事务已完成,或SESSION已断开,仅剩local 事务需要处理。
解决方案
参考 How To Resolve Stranded DBA_2PC_PENDING Entries [ID 401302.1]
1、因为RECO为非致命进程,可以KILL处理,关于致命进程,可以查找我之前的blog.
kill reco
2,手动补分布式事务记录,回滚Local事务
alter system disable distributed recovery; insert into pending_trans$ ( LOCAL_TRAN_ID, GLOBAL_TRAN_FMT, GLOBAL_ORACLE_ID, STATE, STATUS, SESSION_VECTOR, RECO_VECTOR, TYPE#, FAIL_TIME, RECO_TIME) values( '2259.15.2626285', /* <== Replace this with your local tran id */ 306206, /* */ 'XXXXXXX.12345.1.2.3', /* These values can be used without any */ 'prepared','P', /* modification. Most of the values are */ hextoraw( '00000001' ), /* constant. */ hextoraw( '00000000' ), /* */ 0, sysdate, sysdate ); insert into pending_sessions$ values( '2259.15.2626285',/* <==Replace only this with your local tran id */ 1, hextoraw('05004F003A1500000104'), 'C', 0, 30258592, '', 146 ); commit; rollback force '2259.15.2626285'; --or --commit force '2259.15.2626285'; delete from pending_trans$ where local_tran_id='2259.15.2626285'; delete from pending_sessions$ where local_tran_id='2259.15.2626285'; commit;
这类操作较多,可以参考MOS 自己批一个类型于COE的脚本,一键输出解决方案。同事写了一个供参考
CREATE OR REPLACE PROCEDURE sys.clean_dead_transactions AS -- 声明变量来存储死事务 COUNT(*) 的值和死事务的信息 v_count NUMBER(10); v_count1 number(10); v_trans_id VARCHAR2(100); v_ktuxesta x$ktuxe.ktuxesta%TYPE; v_ktuxecfl x$ktuxe.ktuxecfl%TYPE; v_ktuxesiz x$ktuxe.ktuxesiz%TYPE; BEGIN -- 执行死事务 COUNT(*) 查询 SELECT COUNT(*) INTO v_count FROM x$ktuxe WHERE KTUXESTA='PREPARED' AND KTUXECFL like'%DEAD'; -- 判断 COUNT(*) 是否为 0,如果是则退出 IF v_count = 0 THEN DBMS_OUTPUT.PUT_LINE('No dead trans!'); RETURN; END IF; -- 如果 COUNT(*) 不为 0,继续执行其他操作 DBMS_OUTPUT.PUT_LINE('====================================================================================='); DBMS_OUTPUT.PUT_LINE('Found ' || v_count || ' rows!'); -- 执行查询并将结果存储到变量中 FOR rec IN (SELECT KTUXEUSN || '.' || KTUXESLT || '.' || KTUXESQN trans_id, KTUXESTA, KTUXECFL, KTUXESIZ FROM x$ktuxe WHERE KTUXESTA='PREPARED' AND KTUXECFL like'%DEAD') LOOP v_trans_id := rec.trans_id; v_ktuxesta := rec.KTUXESTA; v_ktuxecfl := rec.KTUXECFL; v_ktuxesiz := rec.KTUXESIZ; -- 输出死事务信息 DBMS_OUTPUT.PUT_LINE('local_trans_id: ' || v_trans_id || ', Status: ' || v_ktuxesta || ', Flags: ' || v_ktuxecfl || ', KTUXESIZ: ' || v_ktuxesiz); DBMS_OUTPUT.PUT_LINE('====================================================================================='); -- 判断 local_trans_id 是否在 dba_2pc_pending 中不存在 select count(*) into v_count1 from dba_2pc_pending where LOCAL_TRAN_ID = v_trans_id; IF v_count1 <> 0 THEN DBMS_OUTPUT.PUT_LINE('This dead trans exists in dba_2pc_pending!'); RETURN; END IF; -- 如果 COUNT1(*) 为 0,继续执行其他操作 DBMS_OUTPUT.PUT_LINE('This dead trans does not exist in dba_2pc_pending'); -- 输出清理语句 DBMS_OUTPUT.PUT_LINE('====================================================================================='); DBMS_OUTPUT.PUT_LINE('alter system disable distributed recovery;'); DBMS_OUTPUT.PUT_LINE('INSERT INTO pending_trans (LOCAL_TRAN_ID, GLOBAL_TRAN_FMT, GLOBAL_ORACLE_ID, STATE, STATUS, SESSION_VECTOR, RECO_VECTOR, TYPE#, FAIL_TIME, RECO_TIME) VALUES (''' || v_trans_id || ''', 306206, ''XXXXXXX.12345.1.2.3'', ''prepared'', ''P'', hextoraw(''00000001''), hextoraw(''00000000''), 0, sysdate, sysdate);'); DBMS_OUTPUT.PUT_LINE('INSERT INTO pending_sessions$ VALUES (''' || v_trans_id || ''', 1, hextoraw(''05004F003A1500000104''), ''C'', 0, 30258592, '''', 146);'); DBMS_OUTPUT.PUT_LINE('COMMIT;'); DBMS_OUTPUT.PUT_LINE('====================================================================================='); DBMS_OUTPUT.PUT_LINE('rollback force ''' || v_trans_id || ''';'); DBMS_OUTPUT.PUT_LINE('OR'); DBMS_OUTPUT.PUT_LINE('commit force ''' || v_trans_id || ''';'); DBMS_OUTPUT.PUT_LINE('delete from pending_trans$ where local_tran_id= ''' || v_trans_id ||''';'); DBMS_OUTPUT.PUT_LINE('delete from pending_sessions$ where local_tran_id= ''' || v_trans_id ||''';'); DBMS_OUTPUT.PUT_LINE('commit;'); DBMS_OUTPUT.PUT_LINE('alter system enable distributed recovery ;'); DBMS_OUTPUT.PUT_LINE('====================================================================================='); END LOOP; END; /
对不起,这篇文章暂时关闭评论。