Troubleshooting ORA-07445 [skgxpdmpmem()+22096]
上个月有个节点重启了,数据库环境10.2.0.5 2nodes RAC on hpux 11.31 ia64. 下面简单的记录一下。
SQL> select startup_time from gv$instance; STARTUP_TIME ------------------- 2014-12-21 20:55:58 2013-08-05 23:21:21
# alert
Sun Dec 21 20:55:41 EAT 2014 Errors in file /opt/oracle/app/admin/anbob/bdump/anbob1_lms0_13499.trc: ORA-07445: exception encountered: core dump [skgxpdmpmem()+22096] [SIGSEGV] [Address not mapped to object] [0x0002D0B8C] [] [] Sun Dec 21 20:55:43 EAT 2014 Trace dumping is performing id=[cdmp_20141221205543] Sun Dec 21 20:55:45 EAT 2014 Errors in file /opt/oracle/app/admin/anbob/bdump/anbob1_pmon_13442.trc: ORA-00484: LMS* process terminated with error Sun Dec 21 20:55:45 EAT 2014 PMON: terminating instance due to error 484
# anbob1_lms0_13499.trc
*** 2014-12-21 20:55:41.673 Exception signal: 11 (SIGSEGV), code: 1 (Address not mapped to object), addr: 0x2d0b8c, PC: [0xc00000000bc88fa0, skgxpdmpmem()+22096] r1: 9ffffffffd7ef2e8 r20: 0 br5: 0 r2: c00000000bc45bf0 r21: 21c br6: c00000000042e670 r3: 9fffffff5fb77c00 r22: d08f br7: c00000000bc6aff0 r4: 0 r23: 64 ip: c00000000bc88fa0 r5: c000000000000408 r24: d088 iipa: 0 r6: c0000000000443e0 r25: 9ffffffffce7a968 cfm: 4fa6 r7: 9ffffffffd7f8de8 r26: 163 um: 1a r8: 9ffffffffc8ba078 r27: fffffffffffffffb rsc: 1f r9: 2d0b8c r28: ffffffffffff2f6d bsp: 9ffffffffd8006a0 r10: 9ffffffffd07bcf8 r29: d08f bspstore: 9ffffffffd8006a0 r11: 9ffffffffd7ef530 r30: d095 rnat: 0 r12: 9fffffffffffaf70 r31: 9ffffffffd07bd70 ccv: 0 r13: 9ffffffffd4554b0 NaTs: 0 unat: 0 r14: 1 PRs: 28e97 fpsr: 9804c8a74433f r15: 600000000033ee18 br0: c00000000bc7b6c0 pfs: c000000000000b1d r16: d08f br1: c000000000294bc0 lc: 0 r17: 1f8 br2: 0 ec: 0 r18: 2b3c br3: 0 isr: 9ffffffffd8006a0 r19: 0 br4: 0 ifa: 0 Reason code: 0008 *** 2014-12-21 20:55:41.684 ksedmp: internal or fatal error ORA-07445: exception encountered: core dump [skgxpdmpmem()+22096] [SIGSEGV] [Address not mapped to object] [0x0002D0B8C] [] [] ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ksedst()+64 call ksedst1() 000000001 ? 000000001 ? ksedmp()+2176 call ksedst() 000000001 ? C000000000000D20 ? ssexhd()+1264 call ksedmp() 000000003 ? 6000000000230DA0 ? 60000000000C7420 ?call ssexhd() C0000002FF6101D8 ? 60000000000C9570 ? skgxpdmpmem()+22096 call 6000000000235200 ? 10000000B ? 6000000000235010 ? skgxpmcpy()+50848 call skgxpdmpmem()+22032 9FFFFFFFFD07BC08 ? 0002CE050 ? 60000000003136F0 ? skgxppost()+29232 call skgxpmcpy()+50752 9FFFFFFFFD07BC08 ? 60000000002CE050 ? skgxppost()+4720 call skgxppost()+28800 60000000002CE050 ? 9FFFFFFFFD7EF2E8 ? 60000000003136F0 ? skgxpwait()+464 call skgxppost()+1280 9FFFFFFFFFFFB830 ? 60000000002CEC80 ? ksxpwait()+3296 call skgxpwait() 9FFFFFFFFFFFB830 ? 60000000002CE050 ? $cold_ksliwat()+148 call ksxpwait() 00000001E ? 000000000 ? kslwaitns_timed()+1 call $cold_ksliwat() 000000003 ? 000000001 ? 12 000000035 ? 000000000 ? 000000018 ? 000000000 ? kskthbwt()+400 call kslwaitns_timed() 000000003 ? 000000001 ? 000000035 ? 000000000 ? 000000018 ? 000000000 ? 000000000 ? 9FFFFFFFFFFFBB5C ? kslwait()+640 call kskthbwt() 000000003 ? 000000035 ? 000000000 ? 000000018 ? 000000000 ? 000000000 ? 00000000A ? 000000000 ? ksxprcv()+944 call kslwait() 000000003 ? 000000035 ? 000000000 ? 000000018 ? 000000000 ? 000000000 ? kjctr_rksxp()+736 call ksxprcv() 60000000000C6C98 ? 000000018 ? 9FFFFFFFFFFFC710 ? kjctrcv()+448 call kjctr_rksxp() 9FFFFFFFFD3BA408 ? C0000002FEF88340 ? kjcsrmg()+128 call kjctrcv() 9FFFFFFFFD3BA408 ? C0000002FEF88340 ? kjmsm()+15152 call kjcsrmg() C0000002FCCF6591 ? 9FFFFFFFFFFFCBF4 ? 00002825D ? ksbrdp()+2368 call kjmsm() 9FFFFFFFFFFFD2B0 ? 9FFFFFFFFFFFCBF0 ? 000027119 ? 000000000 ? opirip()+1184 call ksbrdp() 9FFFFFFFFFFFD2C0 ? 60000000000BA268 ? 60000000000C6C98 ? opidrv()+1184 call opirip() 9FFFFFFFFFFFEC00 ? 000000004 ? 9FFFFFFFFFFFF220 ? sou2o()+240 call opidrv() 000000032 ? 60000000000C6C98 ? 9FFFFFFFFFFFF220 ? opimai_real()+336 call sou2o() 9FFFFFFFFFFFF240 ? 000000032 ? 000000004 ? 9FFFFFFFFFFFF220 ? main()+240 call opimai_real() 000000003 ? 000000000 ? main_opd_entry()+80 call main() 000000003 ? 9FFFFFFFFFFFF720 ? 60000000000BA268 ? C000000000000004 ?
#vi /opt/oracle/app/admin/anbob/bdump/anbob1_pmon_13442.trc
Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /opt/oracle/app/product/10.2.0/db_1 System name: HP-UX Node name: anbob1 Release: B.11.31 Version: U Machine: ia64 Instance name: anbob1 Redo thread mounted by this instance: 1 Oracle process number: 2 Unix process pid: 13442, image: oracle@anbob1 (PMON) *** 2014-12-21 20:55:45.367 *** SERVICE NAME:(SYS$BACKGROUND) 2014-12-21 20:55:45.366 *** SESSION ID:(885.1) 2014-12-21 20:55:45.366 Background process LMS0 found dead Oracle pid = 7 OS pid (from detached process) = 13499 OS pid (from process state) = 13499 dtp = c000000100e11ec0, proc = c0000002fc552f00 error 484 detected in background process ORA-00484: LMS* process terminated with error ksuitm: waiting up to [5] seconds before killing DIAG(13491)
MOS 内部中只发现了一个最相似的BUG
Bug 14196801 : ORA-7445 [SKGXPDMPMEM()+52481] INTERMI
—– SQL Statement (None)
—– Current SQL information unavailable
– no cursor.
—– Call Stack Trace —–
skdstdst <- ksedst1 <- ksedst <- dbkedDefDump <- ksedmp <- ssexhd <- <- skgxpdmpmem <- skgxpgetimd <- skgxppost <- skgxpvsnd <- ksxprcvimd <- kjctr_rksxp <- kjctrcv <- kjcsrmg <- kjmsm <- ksbrdp <- opirip <- opidrv <- sou2o <- opimai_real <- main <- main_opd_entry
WORKAROUND?
===========
No
RELATED ISSUES (bugs, forums, RFAs)
===================================
There isn't any bug with identical stack trace :
skgxpdmpmem <- skgxpgetimd <- skgxppost
Other bugs with ora-7445 skgxpdmpmem related to LMS:
Bug 9029091: LNX64-10205-RAC: ORA-600: [KJBLREPLAY:DUP] / ORA-7445:[SKGXPDMPMEM()+11838], LMS ==> duplicate of Bug 8913462: AROLTP-D:LMS1 DIED WITH ORA-600 [KJBLREPLAY:DUP] ==> duplicate of Bug 6961928: ORA-600: INTERNAL ERROR CODE, ARGUMENTS: [KJBCLOSE:SH] ==> all of them show also ORA-600 kjbclose:sh
Bug 9097995: LNX64-10205-RAC: LMS HIT [SKGXPDMPMEM] AND [KJBLREPLAY:DUP],INSTANCE CRASHED ==> also duplicate of Bug 8913492
Bug 9009829: LNX64-10205-RAC: LMS HIT ORA-600 [KCLEXPANDLOCK_2], INSTANCE CRASHED ==> fixed on 10.2.0.5 Bug 8985365: LNX:ETL: ORA-7445 [SKGXPDMPMEM()+11838] [SIGSEGV] [UNKNOWN CODE] [0X000000000] ==> also dixed on 10.2.0.5
最后SR 也没有确切BUG ,只是说很像,而且10.2.0.5 版本维护已过期所以无法提交AMERICAN 开发,只能从已知BUG中查询。应该是与主机资源相关的,本机没有部OSW所有有些资源无法查询。
对不起,这篇文章暂时关闭评论。