Troubleshooting ORA-00600: [kpncxcc-1] from dblink & ORA-01801
前几日一数据库实例alert日志中出现了ORA-600 [kpncxcc-1], kpn开头的函数通常用于服务器之间通信, 从trace中也可以发现是一条调用了dblink的select, 在MOS未发现已知BUG, 下面是分享这个案例及我的排查思路.
# trace file
Data Mining and Real Application Testing options
ORACLE_HOME = /oracle/app/oracle/product/11.2.0.3/dbhome_1
System name: HP-UX
Node name: weejar1
Release: B.11.31
Version: U
Machine: ia64
Instance name: anbob1
Redo thread mounted by this instance: 1
Oracle process number: 714
Unix process pid: 6006, image: oracle@weejar1
*** 2017-02-23 10:27:08.207
*** SESSION ID:(3801.15219) 2017-02-23 10:27:08.207
*** CLIENT ID:() 2017-02-23 10:27:08.207
*** SERVICE NAME:(SYS$USERS) 2017-02-23 10:27:08.207
*** MODULE NAME:(PL/SQL Developer) 2017-02-23 10:27:08.207
*** ACTION NAME:(SQL 窗口 - 新建) 2017-02-23 10:27:08.207
Dump continued from file: /oracle/app/oracle/diag/rdbms/tyjc/anbob1/trace/anbob1_ora_6006.trc
ORA-00600: internal error code, arguments: [kpncxcc-1], [12], [5], [7], [], [], [], [], [], [], [], []
========= Dump for incident 1573713 (ORA 600 [kpncxcc-1]) ========
*** 2017-02-23 10:27:08.209
dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
----- Current SQL Statement for this session (sql_id=gas0mmvd8fkar) -----
SELECT t.*,t.rowid FROM common.t_ucp_staffrole@lnk_db1.anbob.com t where enddate<sysdate
----- Abridged Call Stack Trace -----
ksedsts()+544<-kjzdssdmp()+400<-kjzdpcrshnfy()+512<-kstdmp()+416<-dbkedDefDump()+6032<-ksedmp()+64<-ksfdmp()
+96<-$cold_dbgexPhaseII()+576<-dbgexProcessError()+2096<-dbgeExecuteForError()+288<-dbgePostErrorKGE()+2368
<-dbkePostKGE_kgsf()+128<-kgeadse()+800<-kgerinv_internal()+80
<-kgerinv()+96<-kgeasnmierr()+144<-kpncxcc()+1456<-kpnrfcxc()+512<-kpnxrfc()+448<-kpnfchcbk()+448
----- End of Abridged Call Stack Trace -----
Note:
这是一个11.2.0.3 RAC on HPUX的环境, 是PL/SQL DEV用户客户端在执行一条SELECT 带DBLINK的语句触发. MOS中存在一个使用DBLINK和JDBC时的BUG与本案例不符, 在sqlplus中运行该SQL可以重现该问题.
SQL> SELECT t.*,t.rowid FROM common.t_ucp_staffrole@lnk_tbcs_a1.hebei.mobile.com t where enddate<sysdate; ERROR: ORA-00600: internal error code, arguments: [kpncxcc-1], [12], [5], [7], [], [], [], [], [], [], [], [] no rows selected
遇到该类问题,先去数据库的源端运行,查看是否同样存在该问题, 查看源表是否存在异常,下面在dblink的源库执行上面的查询.
SQL> select * from common.T_UCP_STAFFROLE where enddate<sysdate; ERROR: ORA-01801: date format is too long for internal buffer no rows selected
源端数据库同样无法执行, 不过是执行报错ORA-1801错误, 从错误的提示看应试是日期格式的记录存在异常. 该错误使用CTAS 方法无法验证无效记录, 测试是表有问题还是部分记录问题.
SQL> @desc common.T_UCP_STAFFROLE Name Null? Type ------------------------------- -------- ---------------------------- 1 ROLEID VARCHAR2(30) 2 STAFFID VARCHAR2(20) 3 CONDITIONSTR VARCHAR2(500) 4 HOSTEDCCID VARCHAR2(20) 5 STATUS NUMBER(2) 6 STATUSDATE DATE 7 CREATEDATE DATE 8 ORGAID VARCHAR2(32) 9 BEGINDATE DATE 10 ENDDATE DATE SQL> select * from nls_database_parameters; PARAMETER VALUE ------------------------------ ---------------------------------------- NLS_DATE_FORMAT DD-MON-RR NLS_DATE_LANGUAGE AMERICAN NLS_TIME_FORMAT HH.MI.SSXFF AM NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR ... SQL> select * from nls_session_parameters; PARAMETER VALUE ---------------------------- -------------------------------- NLS_DATE_FORMAT YYYYMMDD HH24:MI:SS NLS_DATE_LANGUAGE AMERICAN NLS_SORT BINARY NLS_TIME_FORMAT HH.MI.SSXFF AM NLS_TIMESTAMP_FORMAT DD-MON-RR HH.MI.SSXFF AM NLS_TIME_TZ_FORMAT HH.MI.SSXFF AM TZR NLS_TIMESTAMP_TZ_FORMAT DD-MON-RR HH.MI.SSXFF AM TZR ... SQL> alter session set NLS_DATE_FORMAT='DD-MON-RR'; Session altered. SQL> select * from common.T_UCP_STAFFROLE where enddate<sysdate; ERROR: ORA-01801: date format is too long for internal buffer SQL> select * from dba_indexes 2 where (owner, index_name) in 3 (select c.name, b.name 4 from ind$ a, obj$ b, user$ c 5 where to_char (a.spare6, 'DD-MON-YY') ='00-000-00' and 6 b.obj# = a.obj# and c.user# = b.owner#); no rows selected SQL> create table tt as select * from common.T_UCP_STAFFROLE where rownum<10; Table created. SQL> select * from tt where enddate<sysdate; no rows selected SQL> create table tt tablespace users as select * from common.t_ucp_staffrole; Table created. SQL> select * from tt where enddate<sysdate; ERROR: ORA-01801: date format is too long for internal buffer
Note:
上面尝试了修改SESSION DATA FORMAT和system相同问题依旧存在,排除了参数date format的可能; 且CTAS复制的表同样存在问题, 应试是部分记录有问题, 该表存在4个date类型列, 因select * 所以4个列都有异常记录的可能,不一定是条件中的enddate列, 其实去掉where条件查询一样报错, 只是表中存在几十万记录,不如遍历所有数据等到报错比较久. 下面使用排除法逐个验证.对于日期记录的问题通常要么是最大或最小值比较另类.
SQL> select min(STATUSDATE),min(CREATEDATE),min(BEGINDATE),min(ENDDATE) from common.T_UCP_STAFFROLE ; ERROR: ORA-01801: date format is too long for internal buffer no rows selected SQL> select min(STATUSDATE),min(CREATEDATE) from common.T_UCP_STAFFROLE where enddate<sysdate; ERROR: ORA-01801: date format is too long for internal buffer no rows selected SQL> select min(STATUSDATE) from common.T_UCP_STAFFROLE where enddate<sysdate; ERROR: ORA-01801: date format is too long for internal buffer no rows selected SQL> select min(CREATEDATE) from common.T_UCP_STAFFROLE where enddate<sysdate; MIN(CREATEDATE) ----------------- 20130912 19:21:36
NOTE:
定位到了列STATUSDATE, 下面就要找出这个列中无效记录. 我们可以使用to_char格式化后看最大和最小的几行记录应该可以排查.
SQL> select * from (
2 select rowid rid,to_char(STATUSDATE,'yyyy-mm-dd hh24:mi:ss') , dense_rank() over (order by statusdate ) rn
3 from common.T_UCP_STAFFROLE
4 ) where rn<=5;
RID TO_CHAR(STATUSDATE, RN
------------------ ------------------- ----------
AAAClXAE9AAA2ROAAA 0000-00-00 00:00:00 1
AAAClXAAuAADkqsAC9 0000-00-00 00:00:00 2
AAAClXAAuAADkqsAC+ 0000-00-00 00:00:00 2
AAAClXAAsAADf43ACm 2002-03-22 12:02:07 3
AAAClXAAtAADg6oAAY 2002-04-12 15:22:34 4
AAAClXAAtAADg68AE2 2002-04-12 15:22:34 4
AAAClXAAtAADg68AE3 2002-04-12 15:22:34 4
AAAClXAAtAADg68AE0 2002-04-12 15:22:34 4
AAAClXAAtAADg6oAAc 2002-04-12 15:22:34 4
AAAClXAAsAADf4PAAw 2002-04-21 16:14:02 5
AAAClXAAtAADg6ZACg 2002-04-21 16:14:02 5
AAAClXAAsAADf4fABw 2002-04-21 16:14:02 5
AAAClXAAsAADf4fABX 2002-04-21 16:14:02 5
AAAClXAAsAADf4bAEO 2002-04-21 16:14:02 5
Note:
前三条非常可疑.单独拿出来测试是否是这几条记录存在ora-1801的问题,和正常记录比对
SQL> select STATUSDATE from common.T_UCP_STAFFROLE where rowid='AAAClXAE9AAA2ROAAA'; ERROR: ORA-01801: date format is too long for internal buffer no rows selected SQL> select STATUSDATE from common.T_UCP_STAFFROLE where rowid='AAAClXAAsAADf43ACm'; STATUSDATE ----------------- 20020322 12:02:07 SQL> update common.T_UCP_STAFFROLE set STATUSDATE=CREATEDATE where to_char(STATUSDATE,'yyyy')='0000'; 3 rows updated. SQL> commit; Commit complete. SQL>select * from common.T_UCP_STAFFROLE where enddate<sysdate; -- no error
Note:
就是通过上面的方法找出date类型的列值异常记录, 因日期是’0000-00-00’格式数据错误, 对于其它无效数据用最大值或月份存在’00’ 的同样可以使用to_char后验证. 解决了源库的ORA-1801错误后, dblink 远程查询ORA-600 [kpncxcc-1]的问题也不存在.
对不起,这篇文章暂时关闭评论。