ora-600 [kclfadd_1] 安装one-off patch后一个节点无法启动
朋友有套数据库出现了ora-600 [kclfadd_1]引起数据库实例crash, 数据库版本10.2.0.2 2-nodes RAC, 找我帮分析原因,定位BUG后,在不升级版本并安装oneoff patch后,节点2实例无法启动,在startup 过程种从alert log看在提示了一些后台进程启动后,到LCK0 进程启动后没有任何报错挂起,节点1正常,记录一下分析过程。
ora-600 [kclfadd_1] 错误
Tue Dec 19 09:23:52 2017 Completed: ALTER DATABASE OPEN Tue Dec 19 09:24:46 2017 Shutting down archive processes Tue Dec 19 09:24:51 2017 ARCH shutting down ARC2: Archival stopped Tue Dec 19 09:53:07 2017 Errors in file /u01/admin/anbob/udump/anbob1_ora_389530.trc: ORA-00600: 内部错误代码, 参数: [kclfadd_1], [], [], [], [], [], [], [] Tue Dec 19 09:53:17 2017 Errors in file /u01/admin/anbob/udump/anbob1_ora_389530.trc: ORA-00600: 内部错误代码, 参数: [504], [0x700000010026AD8], [10], [2], [compile environment latch], [0], [0], [0x000000000] ORA-00600: 内部错误代码, 参数: [kclfadd_1], [], [], [], [], [], [], [] Tue Dec 19 09:53:29 2017 Trace dumping is performing id=[cdmp_20171219095329] Tue Dec 19 09:53:31 2017 Errors in file /u01/admin/anbob/udump/anbob1_ora_389530.trc: ORA-00600: 内部错误代码, 参数: [kclfadd_1], [], [], [], [], [], [], [] ORA-00600: 内部错误代码, 参数: [kclfadd_1], [], [], [], [], [], [], [] Tue Dec 19 09:53:41 2017 Errors in file /u01/admin/anbob/udump/anbob1_ora_389530.trc: ORA-00600: 内部错误代码, 参数: [504], [0x700000010026AD8], [10], [2], [compile environment latch], [0], [0], [0x000000000] ORA-00600: 内部错误代码, 参数: [kclfadd_1], [], [], [], [], [], [], [] ORA-00600: 内部错误代码, 参数: [kclfadd_1], [], [], [], [], [], [], [] Tue Dec 19 09:53:53 2017 Trace dumping is performing id=[cdmp_20171219095353] Tue Dec 19 09:53:56 2017 Errors in file /u01/admin/anbob/bdump/anbob1_pmon_516246.trc: ORA-00600: internal error code, arguments: [kclfadd_1], [], [], [], [], [], [], [] Tue Dec 19 09:53:57 2017 Errors in file /u01/admin/anbob/bdump/anbob1_pmon_516246.trc: ORA-00600: internal error code, arguments: [kclfadd_1], [], [], [], [], [], [], [] Tue Dec 19 09:53:57 2017 PMON: terminating instance due to error 472
MOS 确认为BUG,
ORA-00600: internal error code, arguments: [kclfadd_1], [], [], [], [], [], [], []
Cause
This problem only affects 64bit releases of Oracle.
In a RAC environment a global cache element structure can
become corrupted leading to various ORA-600 and instance crashes.
Solution
It is recommended to:
1) Apply a Patch Set where this issue is fixed (10.2.0.3 onwards)
or
2) Apply one off patch for your platform if it’s available, please check PATCH 5071492.
安装PATCH 5071492
下面是apply oneoff patch的日志
oracle@ibmp55a1[/u01/5077508]$opatch apply Invoking OPatch 10.2.0.2.0 Oracle interim Patch Installer version 10.2.0.2.0 Copyright (c) 2005, Oracle Corporation. All rights reserved.. Oracle Home : /u01/product Central Inventory : /u01/oraInventory from : /u01/product/oraInst.loc OPatch version : 10.2.0.2.0 OUI version : 10.2.0.2.0 OUI location : /u01/product/oui Log file location : /u01/product/cfgtoollogs/opatch/opatch-00_Dec_21_14-38-26-GMT+08_Thu.log ApplySession applying interim patch '5077508' to OH '/u01/product' Invoking fuser to check for active processes. Invoking fuser on "/u01/product/bin/oracle" OPatch detected the node list and the local node from the inventory. OPatch will patch the local system then propagate the patch to the remote nodes. This node is part of an Oracle Real Application Cluster. Remote nodes: 'ibmp55a2' Local node: 'ibmp55a1' Please shutdown Oracle instances running out of this ORACLE_HOME on the local system. (Oracle Home = '/u01/product') Is the local system ready for patching? Do you want to proceed? [y|n] y User Responded with: Y Backing up files and inventory (not for auto-rollback) for the Oracle Home Backing up files affected by the patch '5077508' for restore. This might take a while... Backing up files affected by the patch '5077508' for rollback. This might take a while... Patching component oracle.rdbms, 10.2.0.2.0... Updating archive file "/u01/product/lib/libserver10.a" with "lib/libserver10.a/kjb.o" Running make for target ioracle ApplySession adding interim patch '5077508' to inventory Verifying the update... Inventory check OK: Patch ID 5077508 is registered in Oracle Home inventory with proper meta-data. Files check OK: Files from Patch ID 5077508 are present in Oracle Home. The local system has been patched. You can restart Oracle instances on it. Patching in rolling mode. The node 'ibmp55a2' will be patched next. Please shutdown Oracle instances running out of this ORACLE_HOME on 'ibmp55a2'. (Oracle Home = '/u01/product') Is the node ready for patching? Do you want to proceed? [y|n] n User Responded with: N ApplySession exits on request You may exit the patching session and patch remaining nodes later from an un-patched node. Do you want to continue? Do you want to proceed? [y|n] y User Responded with: Y Updating nodes 'ibmp55a2' Apply-related files are: FP = "/u01/product/.patch_storage/5077508_Mar_3_2006_18_19_39/rac/copy_files.txt" DP = "/u01/product/.patch_storage/5077508_Mar_3_2006_18_19_39/rac/copy_dirs.txt" MP = "/u01/product/.patch_storage/5077508_Mar_3_2006_18_19_39/rac/make_cmds.txt" Propagating files to remote nodes... Propagating directories to remote nodes... Running command on remote node 'ibmp55a2': cd /u01/product/rdbms/lib; /usr/ccs/bin/make -f ins_rdbms.mk ioracle ORACLE_HOME=/u01/product || echo REMOTE_MAKE_FAILED::>&2 -------------------------------------------------------------------------------- WARNING for re-link on remote node 'ibmp55a2': OPatch completed the command 'cd /u01/product/rdbms/lib; /usr/ccs/bin/make -f ins_rdbms.mk ioracle ORACLE_HOME=/u01/product || echo REMOTE_MAKE_FAILED::>&2 ' with warnings. This command is from the file '/u01/product/.patch_storage/5077508_Mar_3_2006_18_19_39/rac/make_cmds.txt.instantiated', line number '1' Probable cause: chmod 755 /u01/product/bin - Linking Oracle rm -f /u01/product/rdbms/lib/oracle ld -b64 -o /u01/product/rdbms/lib/oracle -L/u01/product/rdbms/lib/ -L/u01/product/lib/ -bbigtoc -bnoipath -bI:/u01/product/lib/ksms.imp /u01/product/rdbms/lib/opimai.o /u01/product/rdbms/lib/ssoraed.o /u01/product/rdbms/lib/ttcsoi.o -lperfsrv10 /u01/product/lib/nautab.o /u01/product/lib/naeet.o /u01/product/lib/naect.o /u01/product/lib/naedhs.o /u01/product/rdbms/lib/config.o -bI:/usr/lib/aio.exp -lserver10 /u01/product/lib/libodm10.so -lnnet10 -lskgxp10 -lsthasgen10 /u01/product/has/lib/clssgc.o /u01/product/lib/libstskgxn2.a -lstocr10 -lstocrb10 -lstocrutl10 -lsthasgen10 /u01/product/has/lib/clssgc.o /u01/product/lib/libstskgxn2.a -lclient10 -lvsn10 -lcommon10 -lgeneric10 /u01/product/rdbms/lib/defopt.o -lknlopt `if /bin/ar -X64 tv /u01/product/rdbms/lib/libknlopt.a | grep xsyeolap.o > /dev/null 2>&1 ; then echo "-loraolap10 -bE:/u01/product/rdbms/lib/olap.exp" ; fi` -lslax10 -lpls10 -lplp10 -bE:/u01/product/rdbms/lib/plsqlncomp.exp /u01/product/lib/libstclsra10.a -lstdbcfg10 -lserver10 -lclient10 -lvsn10 -lcommon10 -lgeneric10 -lknlopt -lslax10 -lpls10 -lplp10 -ljox10 -bE:/u01/product/rdbms/lib//oracle.exp `sed -e 's/-ljava//g' /u01/product/lib/ldflags` -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnzjs10 -lnro10 `sed -e 's/-ljava//g' /u01/product/lib/ldflags` -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnzjs10 -lclient10 -lvsn10 -lcommon10 -lgeneric10 -lmm -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `sed -e 's/-ljava//g' /u01/product/lib/ldflags` -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnzjs10 -lnro10 `sed -e 's/-ljava//g' /u01/product/lib/ldflags` -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnzjs10 -lclient10 -lvsn10 -lcommon10 -lgeneric10 -lpls10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lclient10 -lvsn10 -lcommon10 -lgeneric10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lserver10 `if /bin/ar -X64 tv /u01/product/rdbms/lib/libknlopt.a | grep "kxmnsd.o" > /dev/null 2>&1 ; then echo " " ; else echo "-lordsdo10"; fi` -lctxc10 -lctx10 -lzx10 -lgx10 -lctx10 -lzx10 -lgx10 -lordimt10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lsnls10 -lunls10 -bE:/u01/product/rdbms/lib//libcorejava.exp -lld -lm `cat /u01/product/lib/sysliblist` -lm `if [ "\`/usr/bin/uname -v\`" = "4" ]; \ then echo "-bI:/u01/product/lib/pw-syscall.exp"; fi;` `if /bin/ar -X64 t /u01/product/rdbms/lib/libknlopt.a | grep '^'kcsm.o > /dev/null 2>&1 ; then echo "-lha_gs_r -lha_em_r -lpthreads"; fi` -locijdbcst10 -lwwg -bpT:0x100000000 -bpD:0x110000000 -bforceimprw mv -f /u01/product/bin/oracle /u01/product/bin/oracleO mv /u01/product/rdbms/lib/oracle /u01/product/bin/oracle chmod 6751 /u01/product/bin/oracleld: 0711-415 WARNING: Symbol plzcls is already exported.ld: 0711-415 WARNING: Symbol plzexe is already exported.ld: 0711-415 WARNING: Symbol plzopn is already exported.ld: ... 0711-783 WARNING: TOC overflow. TOC size: 141728 Maximum size: 65536 Extra instructions are being generated for each reference to a TOC symbol if the symbol is in the TOC overflow area. :failed The node 'ibmp55a2' has been patched. You can restart Oracle instances on it. There were relinks on remote nodes. Remember to check the binary size and timestamp on the nodes 'ibmp55a2' . The following make commands were invoked on remote nodes: 'cd /u01/product/rdbms/lib; /usr/ccs/bin/make -f ins_rdbms.mk ioracle ORACLE_HOME=/u01/product ' -------------------------------------------------------------------------------- The following warnings have occurred during OPatch execution: 1) OUI-67212: -------------------------------------------------------------------------------- WARNING for re-link on remote node 'ibmp55a2': OPatch completed the command 'cd /u01/product/rdbms/lib; /usr/ccs/bin/make -f ins_rdbms.mk ioracle ORACLE_HOME=/u01/product || echo REMOTE_MAKE_FAILED::>&2 ' with warnings. This command is from the file '/u01/product/.patch_storage/5077508_Mar_3_2006_18_19_39/rac/make_cmds.txt.instantiated', line number '1' Probable cause: chmod 755 /u01/product/bin - Linking Oracle rm -f /u01/product/rdbms/lib/oracle ld -b64 -o /u01/product/rdbms/lib/oracle -L/u01/product/rdbms/lib/ -L/u01/product/lib/ -bbigtoc -bnoipath -bI:/u01/product/lib/ksms.imp /u01/product/rdbms/lib/opimai.o /u01/product/rdbms/lib/ssoraed.o /u01/product/rdbms/lib/ttcsoi.o -lperfsrv10 /u01/product/lib/nautab.o /u01/product/lib/naeet.o /u01/product/lib/naect.o /u01/product/lib/naedhs.o /u01/product/rdbms/lib/config.o -bI:/usr/lib/aio.exp -lserver10 /u01/product/lib/libodm10.so -lnnet10 -lskgxp10 -lsthasgen10 /u01/product/has/lib/clssgc.o /u01/product/lib/libstskgxn2.a -lstocr10 -lstocrb10 -lstocrutl10 -lsthasgen10 /u01/product/has/lib/clssgc.o /u01/product/lib/libstskgxn2.a -lclient10 -lvsn10 -lcommon10 -lgeneric10 /u01/product/rdbms/lib/defopt.o -lknlopt `if /bin/ar -X64 tv /u01/product/rdbms/lib/libknlopt.a | grep xsyeolap.o > /dev/null 2>&1 ; then echo "-loraolap10 -bE:/u01/product/rdbms/lib/olap.exp" ; fi` -lslax10 -lpls10 -lplp10 -bE:/u01/product/rdbms/lib/plsqlncomp.exp /u01/product/lib/libstclsra10.a -lstdbcfg10 -lserver10 -lclient10 -lvsn10 -lcommon10 -lgeneric10 -lknlopt -lslax10 -lpls10 -lplp10 -ljox10 -bE:/u01/product/rdbms/lib//oracle.exp `sed -e 's/-ljava//g' /u01/product/lib/ldflags` -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnzjs10 -lnro10 `sed -e 's/-ljava//g' /u01/product/lib/ldflags` -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnzjs10 -lclient10 -lvsn10 -lcommon10 -lgeneric10 -lmm -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 `sed -e 's/-ljava//g' /u01/product/lib/ldflags` -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnzjs10 -lnro10 `sed -e 's/-ljava//g' /u01/product/lib/ldflags` -lncrypt10 -lnsgr10 -lnzjs10 -ln10 -lnnz10 -lnl10 -lnzjs10 -lclient10 -lvsn10 -lcommon10 -lgeneric10 -lpls10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lclient10 -lvsn10 -lcommon10 -lgeneric10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lserver10 `if /bin/ar -X64 tv /u01/product/rdbms/lib/libknlopt.a | grep "kxmnsd.o" > /dev/null 2>&1 ; then echo " " ; else echo "-lordsdo10"; fi` -lctxc10 -lctx10 -lzx10 -lgx10 -lctx10 -lzx10 -lgx10 -lordimt10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lcore10 -lsnls10 -lnls10 -lxml10 -lcore10 -lunls10 -lsnls10 -lnls10 -lcore10 -lnls10 -lsnls10 -lunls10 -bE:/u01/product/rdbms/lib//libcorejava.exp -lld -lm `cat /u01/product/lib/sysliblist` -lm `if [ "\`/usr/bin/uname -v\`" = "4" ]; \ then echo "-bI:/u01/product/lib/pw-syscall.exp"; fi;` `if /bin/ar -X64 t /u01/product/rdbms/lib/libknlopt.a | grep '^'kcsm.o > /dev/null 2>&1 ; then echo "-lha_gs_r -lha_em_r -lpthreads"; fi` -locijdbcst10 -lwwg -bpT:0x100000000 -bpD:0x110000000 -bforceimprw mv -f /u01/product/bin/oracle /u01/product/bin/oracleO mv /u01/product/rdbms/lib/oracle /u01/product/bin/oracle chmod 6751 /u01/product/bin/oracle ld: 0711-415 WARNING: Symbol plzcls is ... ... /lib//libserver10.a[dmbu.o], imported symbol _DBLINF Symbol was expected to be local. Extra instructions are being generated to reference the symbol.ld: 0711-783 WARNING: TOC overflow. TOC size: 141728 Maximum size: 65536 Extra instructions are being generated for each reference to a TOC symbol if the symbol is in the TOC overflow area. :failed -------------------------------------------------------------------------------- OPatch Session completed with warnings. OPatch completed with warnings. oracle@ibmp55a1[/u01/5077508]$opatch lsinventory Invoking OPatch 10.2.0.2.0 Oracle interim Patch Installer version 10.2.0.2.0 Copyright (c) 2005, Oracle Corporation. All rights reserved.. Oracle Home : /u01/product Central Inventory : /u01/oraInventory from : /u01/product/oraInst.loc OPatch version : 10.2.0.2.0 OUI version : 10.2.0.2.0 OUI location : /u01/product/oui Log file location : /u01/product/cfgtoollogs/opatch/opatch-00_Dec_21_14-40-04-GMT+08_Thu.log Lsinventory Output file location : /u01/product/cfgtoollogs/opatch/lsinv/lsinventory-00_Dec_21_14-40-04-GMT+08_Thu.txt -------------------------------------------------------------------------------- Installed Top-level Products (2): Oracle Database 10g 10.2.0.1.0 Oracle Database 10g Release 2 Patch Set 1 10.2.0.2.0 There are 2 products installed in this Oracle Home. Interim patches (1) : Patch 5077508 : applied on Thu Dec 21 14:38:42 GMT+08:00 2017 Created on 3 Mar 2006, 18:19:39 hrs US/Pacific Bugs fixed: 5071492 Rac system comprising of multiple nodes Local node = ibmp55a1 Remote node = ibmp55a2 -------------------------------------------------------------------------------- OPatch succeeded. oracle@ibmp55a1[/u01/5077508]$
附这么大篇幅的日志为了更加准确的描述这个问题,这也是我了解到的唯一数据, 在节点1应用补丁后,再去2节点应用,包括后期的 — local 方式修复补丁都无法启动2节点。
1, 首先尝试停掉1节点,重启2节点, 问题现象一样
2, rollback 这个oneoff patch, 问题现象一样
3, 检查oracle 二进制执行文件大小和最后修改日期及权限,两个节点一致
4, 尝试手动relink问题节点
先对比$ORACLE_HOME/rdbms/lib 下类文件数量和大小是否一致?如果不一致把正常节点的lib下的文件复制到问题节点。
在当前版本有个bug 5128575 文件libknlopt.a不一致。尝试复制$ORACLE_HOME/rdbms/lib/libknlopt.a 从正常节点到问题节点,这个文件包含了oracle二进制文件中option的启用情况。如检查RAC 是否已启用:
On Linux/UNIX except AIX:
ar -t $ORACLE_HOME/rdbms/lib/libknlopt.a|grep kcsm.o
On AIX:
ar -X32_64 -t $ORACLE_HOME/rdbms/lib/libknlopt.a|grep kcsm.o
如禁用RAC option,可以使用:
$ cd $ORACLE_HOME/rdbms/lib $ make -f ins_rdbms.mk rac_off ioracle
从11.2起增加了一个新工具chopt同样可以启停option,如禁用分区option:
$ chopt disable partitioning
本案例虽然libknlopt.a在两个节点的文件一样大,但仍旧把正常节点的libknlopt.a文件复制到了问题节点后编译:
cd $ORACLE_HOME/rdbms/lib make -f ins_rdbms.mk ioracle
再次启动问题节点,恢复正常。 并且ora-600[kclfadd_1]的问题解决。
目前这篇文章有1条评论(Rss)评论关闭。