Troubleshooting 19c ORA-31626: job does not exist, ORA-955 QT_*BUFFER
一客户oracle 19c(19.9) RAC环境在尝试expdp导出时提示ora-31626然后中断,然后找到我,之前已做过一些尝试包括重建了数据库datapump组件(dpload.sql), 确实问题比较隐蔽, 下面简单的记录该问题。
$ expdp '/ as sysdba' directory=xxx schemas=xxx dumpfile =xxx logfile=xx job_name=exp1 ORA-31626: job does not exist
Note:
log 中并没有其他报错信息。
alert log
2022-07-14T15:30:59.439134+08:00 Thread 1 advanced to log sequence 34554 (LGWR switch), current SCN: 843425459 Current log# 3 seq# 34554 mem# 0: /orcldata/anbob/onlinelog/o1_mf_3_j7ldctg2_.log 2022-07-14T15:31:00.954791+08:00 ARC1 (PID:2311): Archived Log entry 62975 added for T-1.S-34553 ID 0x5c4a9838 LAD:1 2022-07-14T15:31:14.647495+08:00 TT00 (PID:2305): Error 1034 received logging on to the standby 2022-07-14T15:31:42.598301+08:00 Errors in file /u01/app/oracle/diag/rdbms/anbob/orcl/trace/orcl_ora_27067.trc: ORA-31626: job does not exist 2022-07-14T15:31:45.202992+08:00 Errors in file /u01/app/oracle/diag/rdbms/anbob/orcl/trace/orcl_ora_27067.trc: ORA-31626: job does not exist 2022-07-14T15:32:15.626814+08:00 Errors in file /u01/app/oracle/diag/rdbms/anbob/orcl/trace/orcl_ora_27067.trc: ORA-31626: job does not exist ORA-39086: cannot retrieve job information ORA-06512: at "SYS.DBMS_SYS_ERROR", line 79 2022-07-14T15:34:30.167869+08:00 Errors in file /u01/app/oracle/diag/rdbms/anbob/orcl/trace/orcl_ora_31459.trc: ORA-00955: name is already used by an existing object 2022-07-14T15:34:32.991988+08:00 DM00 started with pid=133, OS id=31464, job SYS.EXP1
当时的ASH信息
SQL> @ase last_call USERNAME SID EVENT MACHINE MODULE STATUS et SEQ# SQL_ID WAI_SECINW ROW_WAIT_OBJ# SQLTEXT BS CH# OSUSER HEX ---------- ---------- -------------------- ---------- -------------------- ------- --------- ---------- --------------- ---------- ------------- ------------------------------ ---------- ---- ---------- ---------- XXX 3410 On CPU / runqueue XXX.XXaa JDBC Thin Client ACTIVE 0 34301 b1anbkr4qwfjy -1:0 -1 SELECT * FROM tabx WHERE : 0 root 10252dd SYS 1426 wait for unread mess XXX-db ude ACTIVE 75 30124 afcz9s4uazbpk 0:5 -1 BEGIN :1 := sys.kupc$que_int.t : 0 oracle 10000e3 SYS 1995 Streams AQ: enqueue XXX-db Data Pump Master ACTIVE 915 1003 8rgw5q94paqsg 0:15 569023 BEGIN sys.kupc$que_int.send(:1 : 0 oracle 1000011 SQL> select event event1 from v$session where sid=1426; EVENT1 ---------------------------------------------------------------- Streams AQ: enqueue blocked on low memory
此问题可能发生在AMM, ASMM 环境中(设置了 sga_target 或 memory_target),当buffer cache中有负载并且stream pool 内存正在移动到buffer cache
SQL> show parameter strea NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ streams_pool_size big integer 1G SQL> show parameter target NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ archive_lag_target integer 0 db_big_table_cache_percent_target string 0 db_flashback_retention_target integer 1440 fast_start_io_target integer 0 fast_start_mttr_target integer 0 memory_max_target big integer 0 memory_target big integer 0 parallel_servers_target integer 128 pga_aggregate_target big integer 300M sga_target big integer 28G target_pdbs integer 0 SQL> alter system set streams_pool_size=2048m; System altered. SQL> select shrink_phase_knlasg from X$KNLASG; SHRINK_PHASE_KNLASG ------------------- 1
如果遇到类似的减速,请检查以下查询是否不断返回“1”。该值表示流池处于收缩阶段。当流池完成收缩时,该值应该返回“0”,但如果它一直返回“1”,那么您可能遇到了这个问题, 这个问题不是报错的原因,继续分析报错信息。
启用datapump trace
alter system set events 'sql_trace {process: pname = dw | process: pname = dm} level=12'; 配置expdp ... TRACE=480300
启用 errorstack
SQL> oradebug eventdump system 955 trace name errorstack level 4 31626 trace name errorstack forever,level 3
Trace file
WARNING:1 Oracle process running out of OS kernelI/O resources request_aiolimit=128 set_aiolimit=0 <<<<<<<<<<<<<<<
Trace file /u01/app/oracle/diag/rdbms/anbob/orcl/trace/orcl_ora_28437.trc
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.9.0.0.0
Build label: RDBMS_19.9.0.0.0DBRU_LINUX.X64_200930
ORACLE_HOME: /u01/app/oracle/product/19.0.0/db
..
----- Error Stack Dump -----
at 0x7ffe6f931580 placed dbkda.c@296
ORA-00955: name is already used by an existing object
at 0x7ffe6f935018 placed cvw.c@1608
----- Current SQL Statement for this session (sql_id=6wksvt0zdd12f) -----
CREATE VIEW SYS.qt566658_BUFFER AS SELECT ADDR, INDX, INST_ID, OBJNO, QUEUE_ID, MSGID, CORRID, SEQUENCE_NUM, MSG_NUM, STATE, PRIORITY, EXPIRATION, ENQ_TIME, ENQ_UID, ENQ_USER_NAME, RETRY_COUNT, SENDER_NAME, SENDER_ADDRESS, SENDER_PROTOCOL, DEQUEUE_MSGID, SRCSEQUENCE_NUM, SUBSCRIBER_ID, EXCEPTIONQ_SCHEMA, EXCEPTIONQ_NAME FROM X$BUFFER2 x WHERE objno = 566658
----- PL/SQL Stack -----
----- PL/SQL Call Stack -----
object line object
handle number name
0xceec76d8 302 package body SYS.DBMS_AQADM_SYSCALLS.KWQA_3GL_EXECUTESTMT
0x126586b40 495 package body SYS.DBMS_AQADM_INV.EXECUTE_STMT2
0x15233d0d8 3292 package body SYS.DBMS_AQADM_SYS.CREATE_BUFFER_VIEW_COMMON
0x15233d0d8 3348 package body SYS.DBMS_AQADM_SYS.CREATE_BUFFER_VIEW
0x15233d0d8 8301 package body SYS.DBMS_AQADM_SYS.ADD_BUFFER_TABLES_INT
0x15233d0d8 8222 package body SYS.DBMS_AQADM_SYS.ADD_BUFFER_INT
0x15233d0d8 8071 package body SYS.DBMS_AQADM_SYS.ADD_BUFFER
0x11a2721e8 1546 package body SYS.KUPC$QUE_INT.CREATE_QUEUES
0x105ae3458 1 anonymous block
0x135fcd338 158 package body SYS.KUPC$QUEUE_INT.CREATE_QUEUES
0x17e845018 1002 package body SYS.KUPV$FT_INT.CREATE_NEW_JOB
0x145507b80 1646 package body SYS.KUPV$FT.CREATE_ATTACH_JOB
0x145507b80 1103 package body SYS.KUPV$FT.OPEN_JOB
0x15bc169e0 6809 package body SYS.DBMS_DATAPUMP.OPEN
..
NOTE:
从trace中又可以发现一个小问题WARNING:1 Oracle process running out of OS kernelI/O resources request_aiolimit=128 set_aiolimit=0
对于生产数据库系统,Oracle 建议您调整这些值以优化系统性能。
sysctl -w fs.aio_max_nr=4194304
And the maximum of aio-max-nr parameter can be calculated by aio-max-nr =no of process per DB * no of databases * 4096
you should still monitor the /proc/sys/fs/aio-nr value and also monitor I/O loading.
Points to know –
1) Number of processes per database is not background server process number, it’s the number of background server process + user process.
2) Parameter aio-max-nr is a limitation of aio-nr, no impact was reported if you set it too high. But we should still monitor /proc/sys/fs/aio-nr value and tune this value.
当然这也不是问题的关键, 另一个问题是创建CREATE VIEW SYS.qtxxxxx_BUFFER提示已存在。
QT_*BUFFER Views
它是高级排队功能的架构设计的一部分,在导出时会创建queue table,然后根据table的object_id 创建 QTNNNN_buffer.
For example:
1) Schema user SCOTT creates a queue_table called QT_1
2) QT_1 object_id=12345
3) An object of type VIEW gets created under schema SYS with name QT12345_BUFFER.
If the queue_table is dropped, so are underlying objects. And SYS.QTxxxxx_BUFFER is one of the underlying objects.
1* select owner,object_type,object_name,created from dba_objects where object_name like 'QT%BUFF%'
OWNER OBJECT_TYPE OBJECT_NAME CREATED
-------------------- ------------------------------ ------------------------------ -----------------
...
SYS VIEW QT21023_BUFFER 20190417 01:14:59
SYS VIEW QT22754_BUFFER 20190417 01:40:16
SYS VIEW QT566658_BUFFER 20220712 21:52:55
12 rows selected.
SQL> select owner,object_name,object_type from dba_objects where object_id=566658;
OWNER OBJECT_NAME OBJECT_TYPE
-------------------- ------------------------------ ------------------------------
SYS KUPC$DATAPUMP_QUETAB_1 TABLE
1* select * from dba_queues where queue_table='KUPC$DATAPUMP_QUETAB_1'
OWNER NAME QUEUE_TABLE QID QUEUE_TYPE MAX_RETRIES RETRY_DELAY ENQUEUE DEQUEUE RETENTION USER_COMMENT
-------------------- ------------------------------ ------------------------------ ---------- -------------------- ----------- ----------- ------- ------- ---------------------------------------- --------------------------------------------------
NETWORK_NAME
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SHARD
-----
SYS AQ$_KUPC$DATAPUMP_QUETAB_1_E KUPC$DATAPUMP_QUETAB_1 566702 EXCEPTION_QUEUE 0 0 NO NO 0 exception queue
FALSE
SQL> exec dbms_aqadm.stop_queue('AQ$_KUPC$DATAPUMP_QUETAB_1_E');
PL/SQL procedure successfully completed.
SQL> exec dbms_aqadm.drop_queue('AQ$_KUPC$DATAPUMP_QUETAB_1_E');
PL/SQL procedure successfully completed.
SQL> select owner,object_name,object_type from dba_objects where object_id=566658;
OWNER OBJECT_NAME OBJECT_TYPE
-------------------- ------------------------------ ------------------------------
SYS KUPC$DATAPUMP_QUETAB_1 TABLE
SQL> exec dbms_aqadm.drop_queue_table('KUPC$DATAPUMP_QUETAB_1');
PL/SQL procedure successfully completed.
--确认qt*buufer 已删除。
NOte:
提示报错的对象存在已删除,重新expdp发现还是报相同的错,更换table mode或导出用户,问题依旧。
根据报错的包调用,查看逻辑, 需要unwrap package body
PACKAGE BODY dbms_aqadm_inv AS
18 OBJECT_ALREADY_EXISTS EXCEPTION;
19 PRAGMA EXCEPTION_INIT(OBJECT_ALREADY_EXISTS, -955);
20
21 INDEX_DOES_NOT_EXIST EXCEPTION;
22 PRAGMA EXCEPTION_INIT(INDEX_DOES_NOT_EXIST, -1418);
23
24 NAME_IN_USE EXCEPTION;
25 PRAGMA EXCEPTION_INIT(NAME_IN_USE, -955);
PROCEDURE EXECUTE_STMT2(STMT IN VARCHAR2,
418 IGNORE_DDL_ERROR IN BOOLEAN DEFAULT FALSE,
419 SYSTEM_CURSOR IN BOOLEAN DEFAULT FALSE,
420 IGNORE_COMPERR IN BOOLEAN DEFAULT FALSE) IS
421
422 STREAMS_CAPTURE_OFF BOOLEAN;
424 BEGIN
425 DBMS_AQADM_SYS.WRITE_TRACE(MESSAGE => 'execute_stmt2: '|| STMT);
426 STREAMS_CAPTURE_OFF := DBMS_AQADM_SYSCALLS.KWQA_3GL_ISSTREAMSCAPTUREOFF;
427 IF (STREAMS_CAPTURE_OFF = FALSE) THEN
428 DBMS_AQADM_SYSCALLS.KWQA_3GL_SETSTREAMSCAPTUREOFF(TRUE);
429 END IF;
430
432 IF SYSTEM_CURSOR THEN
433 DBMS_AQADM_SYSCALLS.KWQA_3GL_EXECUTESTMT(STMT); <<<<< 1
434 ELSE
435 EXECUTE IMMEDIATE STMT;
436 END IF;
437 IF (STREAMS_CAPTURE_OFF = FALSE) THEN
438 DBMS_AQADM_SYSCALLS.KWQA_3GL_SETSTREAMSCAPTUREOFF(FALSE);
439 END IF;
440 EXCEPTION
441 WHEN TABLE_DOES_NOT_EXIST THEN
442 IF (STREAMS_CAPTURE_OFF = FALSE) THEN
443 DBMS_AQADM_SYSCALLS.KWQA_3GL_SETSTREAMSCAPTUREOFF(FALSE);
444 END IF;
445 IF NOT IGNORE_DDL_ERROR THEN
446 RAISE;
447 END IF;
448 WHEN INDEX_DOES_NOT_EXIST THEN
449 IF (STREAMS_CAPTURE_OFF = FALSE) THEN
450 DBMS_AQADM_SYSCALLS.KWQA_3GL_SETSTREAMSCAPTUREOFF(FALSE);
451 END IF;
WHEN NAME_IN_USE THEN
456 IF (STREAMS_CAPTURE_OFF = FALSE) THEN
457 DBMS_AQADM_SYSCALLS.KWQA_3GL_SETSTREAMSCAPTUREOFF(FALSE);
458 END IF;
459 IF NOT IGNORE_DDL_ERROR THEN
460 RAISE;
461 END IF;
NOTE:
只附了部分,但是package中没有找到创建两次qt*buffer view的逻辑.
启用10046 trace, 确认是创建了2次qt*view , 看来是调用了2次。19.9的环境从没来遇到过这类问题。MOS中有个类似的BUG 28485837. 但并没有19.9的RU 的one off path.
RU
[oracle@XXX-db OPatch]$ ./opatch lspatches
31668882;OJVM RELEASE UPDATE: 19.9.0.0.201020 (31668882)
31771877;Database Release Update : 19.9.0.0.201020 (31771877)
29585399;OCW RELEASE UPDATE 19.3.0.0.0 (29585399)
OPatch succeeded.
注意, DB HOME中的ocw未安装19.9 补丁,ocw的问题之前遇到过几次,建议安装ocw 19.9,发现安装ocw后问题解决。
对不起,这篇文章暂时关闭评论。