首页 » ORACLE 9i-23ai » ora-600 [kclfadd_1] 安装one-off patch后一个节点无法启动

ora-600 [kclfadd_1] 安装one-off patch后一个节点无法启动

朋友有套数据库出现了ora-600 [kclfadd_1]引起数据库实例crash, 数据库版本10.2.0.2 2-nodes RAC, 找我帮分析原因,定位BUG后,在不升级版本并安装oneoff patch后,节点2实例无法启动,在startup 过程种从alert log看在提示了一些后台进程启动后,到LCK0 进程启动后没有任何报错挂起,节点1正常,记录一下分析过程。

ora-600 [kclfadd_1] 错误

Tue Dec 19 09:23:52 2017
Completed: ALTER DATABASE OPEN
Tue Dec 19 09:24:46 2017
Shutting down archive processes
Tue Dec 19 09:24:51 2017
ARCH shutting down
ARC2: Archival stopped
Tue Dec 19 09:53:07 2017
Errors in file /u01/admin/anbob/udump/anbob1_ora_389530.trc:
ORA-00600: 内部错误代码, 参数: [kclfadd_1], [], [], [], [], [], [], []
Tue Dec 19 09:53:17 2017
Errors in file /u01/admin/anbob/udump/anbob1_ora_389530.trc:
ORA-00600: 内部错误代码, 参数: [504], [0x700000010026AD8], [10], [2], [compile environment latch], [0], [0], [0x000000000]
ORA-00600: 内部错误代码, 参数: [kclfadd_1], [], [], [], [], [], [], []
Tue Dec 19 09:53:29 2017
Trace dumping is performing id=[cdmp_20171219095329]
Tue Dec 19 09:53:31 2017
Errors in file /u01/admin/anbob/udump/anbob1_ora_389530.trc:
ORA-00600: 内部错误代码, 参数: [kclfadd_1], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [kclfadd_1], [], [], [], [], [], [], []
Tue Dec 19 09:53:41 2017
Errors in file /u01/admin/anbob/udump/anbob1_ora_389530.trc:
ORA-00600: 内部错误代码, 参数: [504], [0x700000010026AD8], [10], [2], [compile environment latch], [0], [0], [0x000000000]
ORA-00600: 内部错误代码, 参数: [kclfadd_1], [], [], [], [], [], [], []
ORA-00600: 内部错误代码, 参数: [kclfadd_1], [], [], [], [], [], [], []
Tue Dec 19 09:53:53 2017
Trace dumping is performing id=[cdmp_20171219095353]
Tue Dec 19 09:53:56 2017
Errors in file /u01/admin/anbob/bdump/anbob1_pmon_516246.trc:
ORA-00600: internal error code, arguments: [kclfadd_1], [], [], [], [], [], [], []
Tue Dec 19 09:53:57 2017
Errors in file /u01/admin/anbob/bdump/anbob1_pmon_516246.trc:
ORA-00600: internal error code, arguments: [kclfadd_1], [], [], [], [], [], [], []
Tue Dec 19 09:53:57 2017
PMON: terminating instance due to error 472

MOS 确认为BUG,
ORA-00600: internal error code, arguments: [kclfadd_1], [], [], [], [], [], [], []
Cause
This problem only affects 64bit releases of Oracle.
In a RAC environment a global cache element structure can
become corrupted leading to various ORA-600 and instance crashes.

Solution
It is recommended to:
1) Apply a Patch Set where this issue is fixed (10.2.0.3 onwards)
or
2) Apply one off patch for your platform if it’s available, please check PATCH 5071492.

安装PATCH 5071492
下面是apply oneoff patch的日志

oracle@ibmp55a1[/u01/5077508]$opatch apply
Invoking OPatch 10.2.0.2.0

Oracle interim Patch Installer version 10.2.0.2.0
Copyright (c) 2005, Oracle Corporation.  All rights reserved..

Oracle Home       : /u01/product
Central Inventory : /u01/oraInventory
   from           : /u01/product/oraInst.loc
OPatch version    : 10.2.0.2.0
OUI version       : 10.2.0.2.0
OUI location      : /u01/product/oui
Log file location : /u01/product/cfgtoollogs/opatch/opatch-00_Dec_21_14-38-26-GMT+08_Thu.log

ApplySession applying interim patch '5077508' to OH '/u01/product'
Invoking fuser to check for active processes.
Invoking fuser on "/u01/product/bin/oracle"

OPatch detected the node list and the local node from the inventory.  OPatch will patch the local system then propagate the patch to the remote nodes.

This node is part of an Oracle Real Application Cluster.
Remote nodes: 'ibmp55a2' 
Local node: 'ibmp55a1'
Please shutdown Oracle instances running out of this ORACLE_HOME on the local system.
(Oracle Home = '/u01/product')
Is the local system ready for patching?
Do you want to proceed? [y|n]
y
User Responded with: Y
Backing up files and inventory (not for auto-rollback) for the Oracle Home
Backing up files affected by the patch '5077508' for restore. This might take a while...
Backing up files affected by the patch '5077508' for rollback. This might take a while...
Patching component oracle.rdbms, 10.2.0.2.0...
Updating archive file "/u01/product/lib/libserver10.a"  with "lib/libserver10.a/kjb.o"
Running make for target ioracle
ApplySession adding interim patch '5077508' to inventory
Verifying the update...
Inventory check OK: Patch ID 5077508 is registered in Oracle Home inventory with proper meta-data.
Files check OK: Files from Patch ID 5077508 are present in Oracle Home.

The local system has been patched.  You can restart Oracle instances on it.
Patching in rolling mode.
The node 'ibmp55a2' will be patched next.
Please shutdown Oracle instances running out of this ORACLE_HOME on 'ibmp55a2'.
(Oracle Home = '/u01/product')
Is the node ready for patching?
Do you want to proceed? [y|n]
n
User Responded with: N
ApplySession exits on request
You may exit the patching session and patch remaining nodes later from an un-patched node.  Do you want to continue?
Do you want to proceed? [y|n]
y
User Responded with: Y
Updating nodes 'ibmp55a2' 
   Apply-related files are:
     FP = "/u01/product/.patch_storage/5077508_Mar_3_2006_18_19_39/rac/copy_files.txt"
     DP = "/u01/product/.patch_storage/5077508_Mar_3_2006_18_19_39/rac/copy_dirs.txt"
     MP = "/u01/product/.patch_storage/5077508_Mar_3_2006_18_19_39/rac/make_cmds.txt"
Propagating files to remote nodes...
Propagating directories to remote nodes...
Running command on remote node 'ibmp55a2': cd /u01/product/rdbms/lib; /usr/ccs/bin/make -f ins_rdbms.mk ioracle ORACLE_HOME=/u01/product || echo REMOTE_MAKE_FAILED::>&2 
--------------------------------------------------------------------------------
WARNING for re-link on remote node 'ibmp55a2':
OPatch completed the command 'cd /u01/product/rdbms/lib; /usr/ccs/bin/make -f ins_rdbms.mk ioracle ORACLE_HOME=/u01/product || echo REMOTE_MAKE_FAILED::>&2 ' with warnings.
This command is from the file '/u01/product/.patch_storage/5077508_Mar_3_2006_18_19_39/rac/make_cmds.txt.instantiated', line number '1'
Probable cause:         chmod 755 /u01/product/bin - Linking Oracle     rm -f /u01/product/rdbms/lib/oracle     ld -b64 -o /u01/product/rdbms/lib/oracle -L/u01/product/rdbms/lib/ -L/u01/product/lib/  -bbigtoc -bnoipath -bI:/u01/product/lib/ksms.imp /u01/product/rdbms/lib/opimai.o /u01/product/rdbms/lib/ssoraed.o /u01/product/rdbms/lib/ttcsoi.o  -lperfsrv10 /u01/product/lib/nautab.o /u01/product/lib/naeet.o /u01/product/lib/naect.o /u01/product/lib/naedhs.o /u01/product/rdbms/lib/config.o -bI:/usr/lib/aio.exp   -lserver10 /u01/product/lib/libodm10.so -lnnet10  -lskgxp10 -lsthasgen10 /u01/product/has/lib/clssgc.o /u01/product/lib/libstskgxn2.a -lstocr10 -lstocrb10  -lstocrutl10 -lsthasgen10 /u01/product/has/lib/clssgc.o /u01/product/lib/libstskgxn2.a   -lclient10  -lvsn10  -lcommon10 -lgeneric10  /u01/product/rdbms/lib/defopt.o -lknlopt  `if /bin/ar -X64 tv /u01/product/rdbms/lib/libknlopt.a | grep xsyeolap.o > /dev/null 2>&1 ; then echo "-loraolap10 -bE:/u01/product/rdbms/lib/olap.exp" ; fi`  -lslax10 -lpls10  -lplp10 -bE:/u01/product/rdbms/lib/plsqlncomp.exp  /u01/product/lib/libstclsra10.a -lstdbcfg10 -lserver10 -lclient10  -lvsn10  -lcommon10 -lgeneric10  -lknlopt -lslax10 -lpls10  -lplp10  -ljox10 -bE:/u01/product/rdbms/lib//oracle.exp   `sed -e 's/-ljava//g' /u01/product/lib/ldflags`      -lncrypt10 -lnsgr10 -lnzjs10 -ln10  -lnnz10 -lnl10 -lnzjs10 -lnro10 `sed -e 's/-ljava//g' /u01/product/lib/ldflags`      -lncrypt10 -lnsgr10 -lnzjs10 -ln10  -lnnz10 -lnl10 -lnzjs10 -lclient10  -lvsn10  -lcommon10 -lgeneric10   -lmm -lsnls10 -lnls10  -lcore10 -lsnls10  -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10  -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `sed -e 's/-ljava//g' /u01/product/lib/ldflags`      -lncrypt10 -lnsgr10 -lnzjs10 -ln10  -lnnz10 -lnl10 -lnzjs10 -lnro10 `sed -e 's/-ljava//g' /u01/product/lib/ldflags`      -lncrypt10 -lnsgr10 -lnzjs10 -ln10  -lnnz10 -lnl10 -lnzjs10 -lclient10  -lvsn10  -lcommon10 -lgeneric10 -lpls10   -lsnls10 -lnls10  -lcore10 -lsnls10  -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10  -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10  -lclient10  -lvsn10  -lcommon10 -lgeneric10 -lsnls10 -lnls10  -lcore10 -lsnls10  -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10  -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lserver10 `if /bin/ar -X64 tv /u01/product/rdbms/lib/libknlopt.a | grep "kxmnsd.o" > /dev/null 2>&1 ; then echo " " ; else echo "-lordsdo10"; fi` -lctxc10 -lctx10 -lzx10 -lgx10 -lctx10 -lzx10 -lgx10 -lordimt10  -lsnls10 -lnls10  -lcore10 -lsnls10  -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10  -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lsnls10 -lunls10  -bE:/u01/product/rdbms/lib//libcorejava.exp  -lld -lm `cat /u01/product/lib/sysliblist`  -lm  `if [ "\`/usr/bin/uname -v\`" = "4" ]; \        then echo "-bI:/u01/product/lib/pw-syscall.exp"; fi;`  `if /bin/ar -X64 t /u01/product/rdbms/lib/libknlopt.a | grep '^'kcsm.o > /dev/null 2>&1 ; then echo "-lha_gs_r -lha_em_r -lpthreads"; fi` -locijdbcst10  -lwwg  -bpT:0x100000000 -bpD:0x110000000 -bforceimprw       mv -f /u01/product/bin/oracle /u01/product/bin/oracleO  mv /u01/product/rdbms/lib/oracle /u01/product/bin/oracle        chmod 6751 /u01/product/bin/oracleld: 0711-415 WARNING: Symbol plzcls is already exported.ld: 0711-415 WARNING: Symbol plzexe is already exported.ld: 0711-415 WARNING: Symbol plzopn is already exported.ld: 
...
0711-783 WARNING: TOC overflow. TOC size: 141728        Maximum size: 65536     Extra instructions are being generated for each reference to a TOC      symbol if the symbol is in the TOC overflow area. :failed
The node 'ibmp55a2' has been patched.  You can restart Oracle instances on it.

There were relinks on remote nodes.  Remember to check the binary size and timestamp on the nodes 'ibmp55a2' .
The following make commands were invoked on remote nodes:
'cd /u01/product/rdbms/lib; /usr/ccs/bin/make -f ins_rdbms.mk ioracle ORACLE_HOME=/u01/product
'
--------------------------------------------------------------------------------
The following warnings have occurred during OPatch execution:
1) OUI-67212:
--------------------------------------------------------------------------------
WARNING for re-link on remote node 'ibmp55a2':
OPatch completed the command 'cd /u01/product/rdbms/lib; /usr/ccs/bin/make -f ins_rdbms.mk ioracle ORACLE_HOME=/u01/product || echo REMOTE_MAKE_FAILED::>&2 ' with warnings.
This command is from the file '/u01/product/.patch_storage/5077508_Mar_3_2006_18_19_39/rac/make_cmds.txt.instantiated', line number '1'
Probable cause:         chmod 755 /u01/product/bin - Linking Oracle     rm -f /u01/product/rdbms/lib/oracle     ld -b64 -o /u01/product/rdbms/lib/oracle -L/u01/product/rdbms/lib/ -L/u01/product/lib/  -bbigtoc -bnoipath -bI:/u01/product/lib/ksms.imp /u01/product/rdbms/lib/opimai.o /u01/product/rdbms/lib/ssoraed.o /u01/product/rdbms/lib/ttcsoi.o  -lperfsrv10 /u01/product/lib/nautab.o /u01/product/lib/naeet.o /u01/product/lib/naect.o /u01/product/lib/naedhs.o /u01/product/rdbms/lib/config.o -bI:/usr/lib/aio.exp   -lserver10 /u01/product/lib/libodm10.so -lnnet10  -lskgxp10 -lsthasgen10 /u01/product/has/lib/clssgc.o /u01/product/lib/libstskgxn2.a -lstocr10 -lstocrb10  -lstocrutl10 -lsthasgen10 /u01/product/has/lib/clssgc.o /u01/product/lib/libstskgxn2.a   -lclient10  -lvsn10  -lcommon10 -lgeneric10  /u01/product/rdbms/lib/defopt.o -lknlopt  `if /bin/ar -X64 tv /u01/product/rdbms/lib/libknlopt.a | grep xsyeolap.o > /dev/null 2>&1 ; then echo "-loraolap10 -bE:/u01/product/rdbms/lib/olap.exp" ; fi`  -lslax10 -lpls10  -lplp10 -bE:/u01/product/rdbms/lib/plsqlncomp.exp  /u01/product/lib/libstclsra10.a -lstdbcfg10 -lserver10 -lclient10  -lvsn10  -lcommon10 -lgeneric10  -lknlopt -lslax10 -lpls10  -lplp10  -ljox10 -bE:/u01/product/rdbms/lib//oracle.exp   `sed -e 's/-ljava//g' /u01/product/lib/ldflags`      -lncrypt10 -lnsgr10 -lnzjs10 -ln10  -lnnz10 -lnl10 -lnzjs10 -lnro10 `sed -e 's/-ljava//g' /u01/product/lib/ldflags`      -lncrypt10 -lnsgr10 -lnzjs10 -ln10  -lnnz10 -lnl10 -lnzjs10 -lclient10  -lvsn10  -lcommon10 -lgeneric10   -lmm -lsnls10 -lnls10  -lcore10 -lsnls10  -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10  -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `sed -e 's/-ljava//g' /u01/product/lib/ldflags`      -lncrypt10 -lnsgr10 -lnzjs10 -ln10  -lnnz10 -lnl10 -lnzjs10 -lnro10 `sed -e 's/-ljava//g' /u01/product/lib/ldflags`      -lncrypt10 -lnsgr10 -lnzjs10 -ln10  -lnnz10 -lnl10 -lnzjs10 -lclient10  -lvsn10  -lcommon10 -lgeneric10 -lpls10   -lsnls10 -lnls10  -lcore10 -lsnls10  -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10  -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10  -lclient10  -lvsn10  -lcommon10 -lgeneric10 -lsnls10 -lnls10  -lcore10 -lsnls10  -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10  -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lserver10 `if /bin/ar -X64 tv /u01/product/rdbms/lib/libknlopt.a | grep "kxmnsd.o" > /dev/null 2>&1 ; then echo " " ; else echo "-lordsdo10"; fi` -lctxc10 -lctx10 -lzx10 -lgx10 -lctx10 -lzx10 -lgx10 -lordimt10  -lsnls10 -lnls10  -lcore10 -lsnls10  -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10  -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lsnls10 -lunls10  -bE:/u01/product/rdbms/lib//libcorejava.exp  -lld -lm `cat /u01/product/lib/sysliblist`  -lm  `if [ "\`/usr/bin/uname -v\`" = "4" ]; \        then echo "-bI:/u01/product/lib/pw-syscall.exp"; fi;`  `if /bin/ar -X64 t /u01/product/rdbms/lib/libknlopt.a | grep '^'kcsm.o > /dev/null 2>&1 ; then echo "-lha_gs_r -lha_em_r -lpthreads"; fi` -locijdbcst10  -lwwg  -bpT:0x100000000 -bpD:0x110000000 -bforceimprw       mv -f /u01/product/bin/oracle /u01/product/bin/oracleO  mv /u01/product/rdbms/lib/oracle /u01/product/bin/oracle        chmod 6751 /u01/product/bin/oracle ld: 0711-415 WARNING: Symbol plzcls is 
...
...
/lib//libserver10.a[dmbu.o], imported symbol _DBLINF       Symbol was expected to be local. Extra instructions     are being generated to reference the symbol.ld: 0711-783 WARNING: TOC overflow. TOC size: 141728        Maximum size: 65536     Extra instructions are being generated for each reference to a TOC      symbol if the symbol is in the TOC overflow area. :failed
--------------------------------------------------------------------------------
OPatch Session completed with warnings.

OPatch completed with warnings.
oracle@ibmp55a1[/u01/5077508]$opatch lsinventory
Invoking OPatch 10.2.0.2.0

Oracle interim Patch Installer version 10.2.0.2.0
Copyright (c) 2005, Oracle Corporation.  All rights reserved..

Oracle Home       : /u01/product
Central Inventory : /u01/oraInventory
   from           : /u01/product/oraInst.loc
OPatch version    : 10.2.0.2.0
OUI version       : 10.2.0.2.0
OUI location      : /u01/product/oui
Log file location : /u01/product/cfgtoollogs/opatch/opatch-00_Dec_21_14-40-04-GMT+08_Thu.log

Lsinventory Output file location : /u01/product/cfgtoollogs/opatch/lsinv/lsinventory-00_Dec_21_14-40-04-GMT+08_Thu.txt
--------------------------------------------------------------------------------
Installed Top-level Products (2): 
Oracle Database 10g                                                  10.2.0.1.0
Oracle Database 10g Release 2 Patch Set 1                            10.2.0.2.0
There are 2 products installed in this Oracle Home.
Interim patches (1) :
Patch  5077508      : applied on Thu Dec 21 14:38:42 GMT+08:00 2017
   Created on 3 Mar 2006, 18:19:39 hrs US/Pacific
   Bugs fixed:
     5071492
Rac system comprising of multiple nodes
  Local node = ibmp55a1
  Remote node = ibmp55a2
--------------------------------------------------------------------------------
OPatch succeeded.
oracle@ibmp55a1[/u01/5077508]$

附这么大篇幅的日志为了更加准确的描述这个问题,这也是我了解到的唯一数据, 在节点1应用补丁后,再去2节点应用,包括后期的 — local 方式修复补丁都无法启动2节点。
1, 首先尝试停掉1节点,重启2节点, 问题现象一样
2, rollback 这个oneoff patch, 问题现象一样
3, 检查oracle 二进制执行文件大小和最后修改日期及权限,两个节点一致
4, 尝试手动relink问题节点

先对比$ORACLE_HOME/rdbms/lib 下类文件数量和大小是否一致?如果不一致把正常节点的lib下的文件复制到问题节点。

在当前版本有个bug 5128575  文件libknlopt.a不一致。尝试复制$ORACLE_HOME/rdbms/lib/libknlopt.a 从正常节点到问题节点,这个文件包含了oracle二进制文件中option的启用情况。如检查RAC 是否已启用:
On Linux/UNIX except AIX:

ar -t $ORACLE_HOME/rdbms/lib/libknlopt.a|grep kcsm.o

On AIX:

ar -X32_64 -t $ORACLE_HOME/rdbms/lib/libknlopt.a|grep kcsm.o

如禁用RAC option,可以使用:

$ cd $ORACLE_HOME/rdbms/lib
$ make -f ins_rdbms.mk rac_off  ioracle

从11.2起增加了一个新工具chopt同样可以启停option,如禁用分区option:

$ chopt disable partitioning

本案例虽然libknlopt.a在两个节点的文件一样大,但仍旧把正常节点的libknlopt.a文件复制到了问题节点后编译:

cd $ORACLE_HOME/rdbms/lib 
make -f ins_rdbms.mk ioracle

再次启动问题节点,恢复正常。 并且ora-600[kclfadd_1]的问题解决。

打赏

目前这篇文章有1条评论(Rss)评论关闭。

  1. Johnc581 | #1
    2018-01-15 at 18:35

    I appreciate, cause I discovered just what I used to be looking for. You have ended my four day long hunt! God Bless you man. Have a nice day. Bye gdfcaabfebeg