首页 » ORACLE 9i-23ai » Troubleshooting Oracle a node CRS (asm resource) start fail with “CRS-5019” error after OS reboot

Troubleshooting Oracle a node CRS (asm resource) start fail with “CRS-5019” error after OS reboot

客户一套ORACLE 3 node RAC, 因为需要节点3 主机停机维护, 重启后CRS无法启动, 其它两个节点node 1 ,node2 运行正常。 node 3启动过程中 ora.asm 资源启动hang ,等待enq: dd – contention , 简单记录分析步骤。

GI alert log

2024-08-10 05:25:26.261: 
[cssd(62600)]CRS-1707:Lease acquisition for node rac3 number 2 completed
2024-08-10 05:25:27.553: 
[cssd(62600)]CRS-1605:CSSD voting file is online: /dev/mapper/grid03; details in /u01/product/grid/log/rac3/cssd/ocssd.log.
2024-08-10 05:25:27.559: 
[cssd(62600)]CRS-1605:CSSD voting file is online: /dev/mapper/grid02; details in /u01/product/grid/log/rac3/cssd/ocssd.log.
2024-08-10 05:25:27.568: 
[cssd(62600)]CRS-1605:CSSD voting file is online: /dev/mapper/grid01; details in /u01/product/grid/log/rac3/cssd/ocssd.log.
2024-08-10 05:25:32.109: 
[cssd(62600)]CRS-1601:CSSD Reconfiguration complete. Active nodes are rac3 rac1 rac2 .
...
[ohasd(62261)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2024-08-10 05:25:40.252: 
[ctssd(63018)]CRS-2408:The clock on host rac01 has been updated by the Cluster Time Synchronization Service to be synchronous with the mean cluster time.
[client(63124)]CRS-10001:10-Aug-24 05:25 ACFS-9391: Checking for existing ADVM/ACFS installation.
[client(63129)]CRS-10001:10-Aug-24 05:25 ACFS-9392: Validating ADVM/ACFS installation files for operating system.
[client(63131)]CRS-10001:10-Aug-24 05:25 ACFS-9393: Verifying ASM Administrator setup.
[client(63134)]CRS-10001:10-Aug-24 05:25 ACFS-9308: Loading installed ADVM/ACFS drivers.
[client(63137)]CRS-10001:10-Aug-24 05:25 ACFS-9154: Loading 'oracleoks.ko' driver.
[client(63169)]CRS-10001:10-Aug-24 05:25 ACFS-9154: Loading 'oracleadvm.ko' driver.
[client(63211)]CRS-10001:10-Aug-24 05:25 ACFS-9154: Loading 'oracleacfs.ko' driver.
[client(63316)]CRS-10001:10-Aug-24 05:25 ACFS-9327: Verifying ADVM/ACFS devices.
[client(63324)]CRS-10001:10-Aug-24 05:25 ACFS-9156: Detecting control device '/dev/asm/.asm_ctl_spec'.
[client(63328)]CRS-10001:10-Aug-24 05:25 ACFS-9156: Detecting control device '/dev/ofsctl'.
[client(63333)]CRS-10001:10-Aug-24 05:25 ACFS-9322: completed
2024-08-10 05:35:43.011: 
[/u01/product/grid/bin/oraagent.bin(62477)]CRS-5818:Aborted command 'start' for resource 'ora.asm'. Details at (:CRSAGF00113:) {0:0:2} in /u01/product/grid/log/rac3/agent/ohasd/oraagent_grid//oraagent_grid.log.
2024-08-10 05:35:45.015: 
[ohasd(62261)]CRS-2757:Command 'Start' timed out waiting for response from the resource 'ora.asm'. Details at (:CRSPE00111:) {0:0:2} in /u01/product/grid/log/rac01/ohasd/ohasd.log.
2024-08-10 05:35:45.111: 
[/u01/product/grid/bin/oraagent.bin(62477)]CRS-5019:All OCR locations are on ASM disk groups [OCR_VOT], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/product/grid/log/rac3/agent/ohasd/oraagent_grid//oraagent_grid.log".
2024-08-10 05:35:45.314: 
[ohasd(62261)]CRS-2807:Resource 'ora.crsd' failed to start automatically.
2024-08-10 05:35:46.299: 
[/u01/product/grid/bin/oraagent.bin(62477)]CRS-5019:All OCR locations are on ASM disk groups [OCR_VOT], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/product/grid/log/rac3/agent/ohasd/oraagent_grid//oraagent_grid.log".
2024-08-10 05:36:16.310: 
[/u01/product/grid/bin/oraagent.bin(62477)]CRS-5019:All OCR locations are on ASM disk groups [OCR_VOT], and none of these disk groups are mounted. Details are at "(:CLSN00100:)" in "/u01/product/grid/log/rac3/agent/ohasd/oraagent_grid//oraagent_grid.log".
2024-08-10 05:36:46.325: 

Note:
CRS-5019 OCR ASM disk groups not mounted.

CRS Stack statue

$ crsctl stat res -t -init
--------------------------------------------------------------------------------
NAME           TARGET  STATE        SERVER                   STATE_DETAILS       
--------------------------------------------------------------------------------
Cluster Resources
--------------------------------------------------------------------------------
ora.asm
      1        ONLINE  INTERMEDIATE rac3                     OCR not started     
ora.cluster_interconnect.haip
      1        ONLINE  ONLINE       rac3                                         
ora.crf
      1        ONLINE  ONLINE       rac3                                         
ora.crsd
      1        ONLINE  OFFLINE                                                   
ora.cssd
      1        ONLINE  ONLINE       rac3                                         
ora.cssdmonitor
      1        ONLINE  ONLINE       rac3                                         
ora.ctssd
      1        ONLINE  ONLINE       rac3                     ACTIVE:0            
ora.diskmon
      1        OFFLINE OFFLINE                                                   
ora.evmd
      1        ONLINE  INTERMEDIATE rac3                                         
ora.gipcd
      1        ONLINE  ONLINE       rac3                                         
ora.gpnpd
      1        ONLINE  ONLINE       rac3                                         
ora.mdnsd
      1        ONLINE  ONLINE       rac3      

Note:
ora.asm state “OCR not started”

OLR check

$ ocrcheck -local 
Status of 0racle Local Registry is as follows :
VersionTotal space(kbytes )262120
Used space(kbytes )2676
Available space(kbytes):259444
Device/File Name/u01/product/grid/cdata/rac01.olr
Device/File integrity check succeeded
Local registry integrity check succeeded
Logical corruption check:succeeded

ASM alert log

NOTE: cache opening disk 5 of grp 2: OCR_VOT_0005 path:/dev/mapper/grid03
NOTE: F1X0 found on disk 5 au 196 fcn 0.84109
NOTE: cache mounting (not first) normal redundancy group 2/0x67A3B07C (OCR_VOT)
kjbdomatt send to inst 1
kjbdomatt send to inst 3
Sat Aug 10 05:26:08 2024
NOTE: attached to recovery domain 2
Sat Aug 10 05:26:08 2024
NOTE: redo buffer size is 256 blocks (1053184 bytes)
Sat Aug 10 05:26:08 2024
NOTE: LGWR attempting to mount thread 1 for diskgroup 2 (OCR_VOT)
NOTE: LGWR found thread 1 closed at ABA 22.878
NOTE: LGWR mounted thread 1 for diskgroup 2 (OCR_VOT)
NOTE: LGWR opening thread 1 at fcn 0.106863 ABA 23.879
NOTE: cache mounting group 2/0x67A3B07C (OCR_VOT) succeeded
NOTE: cache ending mount (success) of group OCR_VOT number=2 incarn=0x67a3b07c
Sat Aug 10 05:26:08 2024
NOTE: Instance updated compatible.asm to 11.2.0.0.0 for grp 1
Sat Aug 10 05:48:41 2024
NOTE: [crsctl.bin@rac3 (TNS V1-V3) 74992] opening OCR file

Note
no error

votedisk check

# crsctl query css votedisk

Note:
work fine. all disk are online.

手动重启ASM

$ sqlplus / as sysasm

startup 
-- hang

检查其它所有节点ASM

-- node 2 (running node)
$ asmcmd lsdg

$ ocrcheck
-- hang

sqlplus / as sysasm
-- check active session 
wait event "enq: DD - contention"   final blocking session  node 1 gpnpd process 。

我之前blog 遇到过这个事件 《Troubleshooting query v$asm_disk v$asm_diskgroup hang

解决方法
kill gpnpd.bin 恢复正常

打赏

,

对不起,这篇文章暂时关闭评论。