Troubleshooting Oracle 19c RAC ORA-29770 with LMD hang, LMHB terminating the instance
前段时间一个oracle 19c RAC 1个节点异常重启,日志显示是lmd进程hang 丢失heartbaet 超过70s, Lmhb进程重启了实例, 操作系统资源空闲,从lmhb trace中确实lmd在做free memory的操作。
DB alert log
2023-02-22T15:43:36.739754+08:00
Thread 2 advanced to log sequence 6653 (LGWR switch), current SCN: 19860977446380
Current log# 18 seq# 6653 mem# 0: +DATA/anbob/ONLINELOG/group_18.958.1121127477
2023-02-22T15:43:37.381412+08:00
ARC1 (PID:382736): Archived Log entry 15101 added for T-2.S-6652 ID 0xa5aa2106 LAD:1
2023-02-22T15:53:13.704691+08:00
LMD1 (ospid: 382285) has not called a wait for 81 secs.
2023-02-22T15:53:17.140819+08:00
Errors in file /u01/app/oracle/diag/rdbms/anbob/anbob2/trace/anbob2_lmhb_382315.trc (incident=205480) (PDBNAME=CDB$ROOT):
ORA-29770: global enqueue process LMD1 (OSID 382285) is hung for more than 70 seconds
Incident details in: /u01/app/oracle/diag/rdbms/anbob/anbob2/incident/incdir_205480/anbob2_lmhb_382315_i205480.trc
2023-02-22T15:53:21.087390+08:00
LOCK_DBGRP: GCR_SYSTEST debug event locked group GR+DB_anbob by memno 1
LMHB (ospid: 382315): terminating the instance due to ORA error 29770
Cause - 'ERROR: Some process(s) is not making progress.
LMHB (ospid: 382315) is terminating the instance.
Please check LMHB trace file for more details.
Please also check the CPU load, I/O load and other system properties for anomalous behavior
ERROR: Some process('
2023-02-22T15:53:22.273416+08:00
ORA-1092 : opitsk aborting process
2023-02-22T15:53:23.638753+08:00
License high water mark = 4184
2023-02-22T15:53:24.546319+08:00
Dumping diagnostic data in directory=[cdmp_20230222155321], requested by (instance=2, osid=382315 (LMHB)), summary=[abnormal instance termination].
2023-02-22T15:53:27.517019+08:00
Instance terminated by LMHB, pid = 382315
2023-02-22T15:53:28.731204+08:00
Warning: 2 processes are still attacheded to shmid 1998852:
(size: 81920 bytes, creator pid: 381966, last attach/detach pid: 382133)
2023-02-22T15:53:29.639831+08:00
USER(prelim) (ospid: 260076): terminating the instance
2023-02-22T15:53:29.643403+08:00
Instance terminated by USER(prelim), pid = 260076
2023-02-22T15:53:32.764711+08:00
OS top
zzz ***Wed Feb 22 15:52:21 CST 2023 top up 124 days, 23:12, 0 users, load average: 4.11, 4.27, 4.46 Tasks: 3206 total, 2 running, 3204 sleeping, 0 stopped, 0 zombie %Cpu(s): 0.9 us, 0.5 sy, 0.0 ni, 98.6 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 79094534+total, 24925899+free, 42708208+used, 11460425+buff/cache KiB Swap: 33554428 total, 33554428 free, 0 used. 34671769+avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 382285 oracle 20 0 0.254t 46128 34092 R 100.0 0.0 814:58.30 ora_lmd1_+ 382315 oracle 20 0 0.246t 31544 25232 S 59.4 0.0 157:18.50 ora_lmhb_+ 257900 grid 20 0 112532 5364 692 S 11.3 0.0 0:00.23 pidstat 257904 grid 20 0 112532 5364 692 S 11.3 0.0 0:00.23 pidstat 394798 oracle 20 0 0.254t 57216 47648 S 6.6 0.0 5:23.63 oracle_39+ 257929 grid 20 0 160804 5428 1536 R 5.7 0.0 0:00.14 top
lmhb trace
voluntary_ctxt_switches: 1146306377
nonvoluntary_ctxt_switches: 680777
Short stack dump:
voluntary_ctxt_switches: 1146306377
nonvoluntary_ctxt_switches: 680777
Short stack dump:
ksedsts()+426<-ksdxfstk()+58<-ksdxcb()+872<-sspuser()+223<-__sighandler()<-kjr_freeable_chunk_free()+2925
<-kjrchc()+9283<-kjmdmain_helper()+6258<-kjmdm()+74<-ksbrdp()+1167<-opirip()+541
<-opidrv()+581<-sou2o()+165<-opimai_real()+173<-ssthrdmain()+417<-main()+256<-__libc_start_main()+245
ksdxcb()+872 kernel service debug internal errors ksdx callback for sosd layer signal handler
sspuser()+223 operating system dependent system process management handle SIGUSR2 for Oracle
__sighandler() (?) [partial hit for: ]
kjr_freeable_chunk_free()+2925 Kernel lock management Resource table [partial hit for: kjr ] free memory
kjrchc()+9283 kernel lock management resource table [partial hit for: kjr ]
kjmdmain_helper()+6258 kernel lock management RAC multiple LMS [partial hit for: kjm ]
kjmdm()+74 kernel lock management RAC multiple LMS [partial hit for: kjm ]
ksbrdp()+1167 kernel service background processes run a detached background process
通过错误与CALL stack 匹配 Bug 32076305 ORA-29770 LMD has no heartbeats – LMD Stack is in kjr_freeable_chunk_free
解决方案
安装oneoff path,如果存在
— or —
升级到19.14 RU及以后
对不起,这篇文章暂时关闭评论。