Exadata X7, RAC gipcd 无法启动,因为Network socket files
环境Oracle Exadata Machine(x7)环境, 节点1异常重启后无法启动,另他节点运行正常,从日志显示是gipc进程启动失败,清理network socket 文件启动成功。
# GI alert log
2021-05-07 11:01:47.326 [OHASD(76631)]CRS-0714: Oracle Clusterware Release 18.0.0.0.0. 2021-05-07 11:01:47.338 [OHASD(76631)]CRS-2112: The OLR service started on node x7dbanbob01. 2021-05-07 11:01:47.357 [OHASD(76631)]CRS-8011: reboot advisory message from host: x7dbanbob01, component: cssmonit, with time stamp: L-2021-05-07-10:53:20.575 2021-05-07 11:01:47.357 [OHASD(76631)]CRS-1301: Oracle High Availability Service started on node x7dbanbob01. 2021-05-07 11:01:47.357 [OHASD(76631)]CRS-8013: reboot advisory message text: oracssdmonitor is about to reboot this node due to loss of network connectivity. 2021-05-07 11:01:47.358 [OHASD(76631)]CRS-8011: reboot advisory message from host: x7dbanbob01, component: cssagent, with time stamp: L-2021-05-07-09:30:08.511 2021-05-07 11:01:47.359 [OHASD(76631)]CRS-8013: reboot advisory message text: Rebooting node due to connection problems with CSS 2021-05-07 11:01:47.359 [OHASD(76631)]CRS-8017: location: /etc/oracle/lastgasp has 2 reboot advisory log files, 2 were announced and 0 errors occurred 2021-05-07 11:01:47.376 [ORAAGENT(58115)]CRS-5017: The resource action "ora.asm start" encountered the following error: 2021-05-07 11:01:47.376+CRS-5048: Failure communicating with CRS to access a resource profile or perform an action on a resource . For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/x7dbanbob01/crs/trace/crsd_oraagent_grid.trc". 2021-05-07 11:01:47.413 [ORAAGENT(58115)]CRS-5017: The resource action "ora.asm start" encountered the following error: 2021-05-07 11:01:47.413+CRS-5048: Failure communicating with CRS to access a resource profile or perform an action on a resource . For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/x7dbanbob01/crs/trace/crsd_oraagent_grid.trc". 2021-05-07 11:01:47.426 [ORAAGENT(58115)]CRS-5016: Process "/u01/app/18.0.0.0/grid/bin/lsnrctl" spawned by agent "ORAAGENT" for action "check" failed: details at "(:CLSN00010:)" in "/u01/app/grid/diag/crs/x7dbanbob01/crs/trace/crsd_oraagent_grid.trc" 2021-05-07 11:01:47.585 [ORAAGENT(58115)]CRS-5017: The resource action "ora.asm start" encountered the following error: 2021-05-07 11:01:47.585+CRS-5048: Failure communicating with CRS to access a resource profile or perform an action on a resource . For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/x7dbanbob01/crs/trace/crsd_oraagent_grid.trc". 2021-05-07 11:01:47.617 [ORAAGENT(58115)]CRS-5017: The resource action "ora.asm start" encountered the following error: 2021-05-07 11:01:47.617+CRS-5048: Failure communicating with CRS to access a resource profile or perform an action on a resource . For details refer to "(:CLSN00107:)" in "/u01/app/grid/diag/crs/x7dbanbob01/crs/trace/crsd_oraagent_grid.trc". 2021-05-07 11:01:48.301 [ORAROOTAGENT(76704)]CRS-8500: Oracle Clusterware ORAROOTAGENT process is starting with operating system process ID 76704 2021-05-07 11:01:48.312 [CSSDAGENT(76723)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 76723 2021-05-07 11:01:48.322 [ORAAGENT(76714)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 76714 2021-05-07 11:01:48.326 [CSSDMONITOR(76729)]CRS-8500: Oracle Clusterware CSSDMONITOR process is starting with operating system process ID 76729 2021-05-07 11:01:48.947 [ORAAGENT(76818)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 76818 2021-05-07 11:01:50.135 [ORAAGENT(76856)]CRS-8500: Oracle Clusterware ORAAGENT process is starting with operating system process ID 76856 2021-05-07 11:01:50.280 [MDNSD(76983)]CRS-8500: Oracle Clusterware MDNSD process is starting with operating system process ID 76983 2021-05-07 11:01:50.282 [EVMD(76984)]CRS-8500: Oracle Clusterware EVMD process is starting with operating system process ID 76984 2021-05-07 11:01:52.354 [GPNPD(77086)]CRS-8500: Oracle Clusterware GPNPD process is starting with operating system process ID 77086 2021-05-07 11:01:53.396 [GPNPD(77086)]CRS-2328: GPNPD started on node x7dbanbob01. 2021-05-07 11:01:54.387 [GIPCD(77184)]CRS-8500: Oracle Clusterware GIPCD process is starting with operating system process ID 77184 2021-05-07 11:02:00.673 [CSSDMONITOR(77414)]CRS-8500: Oracle Clusterware CSSDMONITOR process is starting with operating system process ID 77414 2021-05-07 11:02:00.915 [CSSDAGENT(77435)]CRS-8500: Oracle Clusterware CSSDAGENT process is starting with operating system process ID 77435 2021-05-07 11:02:01.662 [OSYSMOND(77576)]CRS-8500: Oracle Clusterware OSYSMOND process is starting with operating system process ID 77576 2021-05-07 11:02:02.157 [OCSSD(77451)]CRS-8500: Oracle Clusterware OCSSD process is starting with operating system process ID 77451 2021-05-07 11:02:03.230 [OCSSD(77451)]CRS-1713: CSSD daemon is started in hub mode 2021-05-07 11:02:12.138 [OCSSD(77451)]CRS-1656: The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /u01/app/grid/diag/crs/x7dbanbob01/crs/trace/ocssd.trc 2021-05-07 11:02:12.166 [OCSSD(77451)]CRS-1652: Starting clean up of CRSD resources. 2021-05-07 11:02:12.173 [OCSSD(77451)]CRS-1653: The clean up of the CRSD resources failed. 2021-05-07 11:02:12.179 [OCSSD(77451)]CRS-8503: Oracle Clusterware process OCSSD with operating system process ID 77451 experienced fatal signal or exception code 6. 2021-05-07T11:02:12.184960+08:00 Errors in file /u01/app/grid/diag/crs/x7dbanbob01/crs/trace/ocssd.trc (incident=185):
# cssd log
2021-05-07 11:13:45.190 : CSSD:1098254080: clssnmSendingThread: sent 4 status msgs to all nodes 2021-05-07 11:13:47.082 :GIPCXCPT:2165839616: gipcWaitF [EvmConWait : evmgipcio.c : 303]: EXCEPTION[ ret (uknown) (910) ] failed to wait on obj 0x7f3d600fa5c0 [0000000000032033] { gipcEndpoint : localAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=)(GIPCID=80bb0572-9d25a17b-38732))', remoteAddr 'clsc://(ADDRESS=(PROTOCOL=ipc)(KEY=SYSTEM.evm.acceptor.auth)(GIPCID=9d25a17b-80bb0572-83914))', numPend 5, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 83914, readyRef (nil), ready 0, wobj 0x7f3d600ede10, sendp 0x7f3d60125490 status 0flags 0xa100a716, flags-2 0x100, usrFlags 0x30020 }, reqList 0x7f3d811721d0, nreq 1, creq 0x7f3d811722a0 timeout 30000 ms, flags 0x240 2021-05-07 11:13:47.082 : CSSDGNS:2165839616: clssgnsGNSEvtHandler: clsce evt res CONN (3) 2021-05-07 11:13:47.082 : CSSDGNS:2165839616: clssgnsCheckGNSConfigured: CLSCE wait(30000) returned 0, clskerror: clsce: CRS-10203: (:CLSCE0063:) Could not connect to the Event Manager daemon, evtres 3 2021-05-07 11:13:47.082 : CSSDGNS:2165839616: clssgnsCheckGNSConfigured: CLSCE connection error, re-subscribing for GNS resource events. 2021-05-07 11:13:47.082 : CLSCEVT:2165839616: (:CLSCE0028:)clsce_unsubscribe 0x7f3d600f0540 successfully unsubscribed : 0 2021-05-07 11:13:50.199 : CSSD:1098254080: clssnmSendingThread: sending status msg to all nodes 2021-05-07 11:13:50.199 : CSSD:1098254080: clssnmSendingThread: sent 5 status msgs to all nodes 2021-05-07 11:13:54.208 : CSSD:1098254080: clssnmSendingThread: sending status msg to all nodes 2021-05-07 11:13:54.208 : CSSD:1098254080: clssnmSendingThread: sent 4 status msgs to all nodes 2021-05-07 11:13:58.217 : CSSD:1098254080: clssnmSendingThread: sending status msg to all nodes 2021-05-07 11:13:58.217 : CSSD:1098254080: clssnmSendingThread: sent 4 status msgs to all nodes 2021-05-07 11:13:59.495 :GIPCHTHR:1129244416: gipchaWorkerWork: workerThread heart beat, time interval since last heartBeat 30010 loopCount 250 sendCount 90 recvCount 225 postCount 60 sendCmplCount 90 recvCmplCount 135
# gipcd log
2021-05-07 11:16:47.860 :GIPCDCLT:3791009536: gipcdClientThread: Client thread has exited 2021-05-07 11:16:47.860 : GIPCTLS:3788908288: gipcmodTlsDisconnect: [tls] disconnect issued on endp 0x7fe1d038ece0 [00000000000024f9] { gipcEndpoint : localAddr 'gipcha://x7dbanbob01:de2d-a349-ffd8-46df', remoteAddr 'gipcha://x7dbanbob04:gipcdha_x7dbanbob04_/f261-6b01-a138-f0e8', numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 37134, readyRef (nil), ready 0, wobj 0x7fe1d02676e0, sendp (nil) status 0flags 0x26038606, flags-2 0x50, usrFlags 0x0 } 2021-05-07 11:16:47.860 :GIPCGMOD:3788908288: gipcmodGipcDisconnect: [gipc] Issued endpoint close for endp 0x7fe1d038ece0 [00000000000024f9] { gipcEndpoint : localAddr 'gipcha://x7dbanbob01:de2d-a349-ffd8-46df', remoteAddr 'gipcha://x7dbanbob04:gipcdha_x7dbanbob04_/f261-6b01-a138-f0e8', numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 37134, readyRef (nil), ready 0, wobj 0x7fe1d02676e0, sendp (nil) status 0flags 0x26038606, flags-2 0x50, usrFlags 0x0 } 2021-05-07 11:16:47.860 :GIPCDNDE:3788908288: gipcdNodeThreadShutdown: Deleted connection with host x7dbanbob02 2021-05-07 11:16:47.860 : GIPCTLS:3788908288: gipcmodTlsDisconnect: [tls] disconnect issued on endp 0x7fe1c406d680 [0000000000002780] { gipcEndpoint : localAddr 'gipcha://x7dbanbob01:gipcdha_x7dbanbob01_/df58-2f47-456d-5509', remoteAddr 'gipcha://x7dbanbob02:aab0-5b04-74e3-b87c', numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 36540, readyRef (nil), ready 0, wobj 0x7fe1c407e250, sendp (nil) status 0flags 0x26138606, flags-2 0x50, usrFlags 0x0 } 2021-05-07 11:16:47.860 :GIPCGMOD:3788908288: gipcmodGipcDisconnect: [gipc] Issued endpoint close for endp 0x7fe1c406d680 [0000000000002780] { gipcEndpoint : localAddr 'gipcha://x7dbanbob01:gipcdha_x7dbanbob01_/df58-2f47-456d-5509', remoteAddr 'gipcha://x7dbanbob02:aab0-5b04-74e3-b87c', numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 36540, readyRef (nil), ready 0, wobj 0x7fe1c407e250, sendp (nil) status 0flags 0x26138606, flags-2 0x50, usrFlags 0x0 } 2021-05-07 11:16:47.860 :GIPCDNDE:3788908288: gipcdNodeThreadShutdown: Deleted connection with host x7dbanbob03 2021-05-07 11:16:47.861 :GIPCHGEN:3778402048: gipchaNodeMarkInfAsTransientF [gipchaDaemonProcessDisconnect : gipchaDaemonThread.c : 5615]: marking infs of node 0x7fe1d01320b0 { host '', haName 'gipcd_ha_name', srcLuid f3e864a7-00000000, dstLuid 00000000-00000000 numInf 2, sentRegister 0, localMonitor 0, baseStream 0x7fe1d00bf590 type gipchaNodeType12001 (20), nodeIncarnation 00000000-00000000, incarnation 0, cssIncarnation 0, negDigest 4294967295, roundTripTime 4294967295 lastSeenPingAck 0 nextPingId 1 latencySrc 0 latencyDst 0 flags 0x200001} as TRANSIENT 2021-05-07 11:16:47.861 : GIPCTLS:3788908288: gipcmodTlsDisconnect: [tls] disconnect issued on endp 0x7fe1c40c7b90 [0000000000002939] { gipcEndpoint : localAddr 'gipcha://x7dbanbob01:gipcdha_x7dbanbob01_/4fcc-c0b7-5128-77bc', remoteAddr 'gipcha://x7dbanbob03:4f30-1708-9e4e-34f6', numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 36540, readyRef (nil), ready 0, wobj 0x7fe1c40b85f0, sendp (nil) status 0flags 0x26138606, flags-2 0x50, usrFlags 0x0 } 2021-05-07 11:16:47.861 :GIPCHGEN:3778402048: gipchaNodeMarkInfAsTransientF [gipchaDaemonProcessDisconnect : gipchaDaemonThread.c : 5632]: marking infs of node 0x7fe1b828a020 { host 'x7dbanbob02', haName 'gipcd_ha_name', srcLuid f3e864a7-ee9c96d5, dstLuid 1bc89bd8-74f78bf5 numInf 2, sentRegister 1, localMonitor 1, baseStream 0x7fe1b8283160 type gipchaNodeType12001 (20), nodeIncarnation 309eb76e-fffd426a, incarnation 0, cssIncarnation 2, negDigest 4294967295, roundTripTime 472 lastSeenPingAck 24 nextPingId 25 latencySrc 2985784710 latencyDst 1309183058 flags 0x860080c} as TRANSIENT 2021-05-07 11:16:47.861 :GIPCGMOD:3788908288: gipcmodGipcDisconnect: [gipc] Issued endpoint close for endp 0x7fe1c40c7b90 [0000000000002939] { gipcEndpoint : localAddr 'gipcha://x7dbanbob01:gipcdha_x7dbanbob01_/4fcc-c0b7-5128-77bc', remoteAddr 'gipcha://x7dbanbob03:4f30-1708-9e4e-34f6', numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 36540, readyRef (nil), ready 0, wobj 0x7fe1c40b85f0, sendp (nil) status 0flags 0x26138606, flags-2 0x50, usrFlags 0x0 } 2021-05-07 11:16:47.861 :GIPCDNDE:3788908288: gipcdNodeThreadShutdown: Deleted connection with host x7dbanbob04 2021-05-07 11:16:47.861 :GIPCHGEN:3778402048: gipchaNodeMarkInfAsTransientF [gipchaDaemonProcessDisconnect : gipchaDaemonThread.c : 5632]: marking infs of node 0x7fe1b82853e0 { host 'x7dbanbob03', haName 'gipcd_ha_name', srcLuid f3e864a7-13b1632f, dstLuid 108fd767-bb86f361 numInf 2, sentRegister 1, localMonitor 1, baseStream 0x7fe1b82841e0 type gipchaNodeType12001 (20), nodeIncarnation a2a8094e-fffd43be, incarnation 0, cssIncarnation 2, negDigest 4294967295, roundTripTime 308 lastSeenPingAck 24 nextPingId 25 latencySrc 2998135837 latencyDst 1296831767 flags 0x860080c} as TRANSIENT 2021-05-07 11:16:47.861 : GIPCD:3786807040: gipcdSetThreadState: changing the status of monitorThread. current status gipcdThreadStatusOnline desired status gipcdThreadStatusOffline 2021-05-07 11:16:47.861 :GIPCHGEN:3778402048: gipchaNodeMarkInfAsTransientF [gipchaDaemonProcessDisconnect : gipchaDaemonThread.c : 5632]: marking infs of node 0x7fe1b828a480 { host 'x7dbanbob04', haName 'gipcd_ha_name', srcLuid f3e864a7-95347438, dstLuid 81a248b7-f01db79a numInf 2, sentRegister 1, localMonitor 1, baseStream 0x7fe1b8281820 type gipchaNodeType12001 (20), nodeIncarnation 8d987030-fffd4b52, incarnation 0, cssIncarnation 2, negDigest 4294967295, roundTripTime 583 lastSeenPingAck 24 nextPingId 25 latencySrc 3236986581 latencyDst 1057981298 flags 0x860080c} as TRANSIENT 2021-05-07 11:16:47.861 :GIPCDMON:3786807040: gipcdMonitorThread: Monitor thread is exiting.. 2021-05-07 11:16:47.861 : GIPCTLS:3788908288: gipcmodTlsDisconnect: [tls] disconnect issued on endp 0x7fe1c407ed10 [00000000000029b7] { gipcEndpoint : localAddr 'gipcha://x7dbanbob01:gipcdha_x7dbanbob01_/2ac7-8166-0b84-0143', remoteAddr 'gipcha://x7dbanbob04:cabb-545e-6653-73e8', numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 37134, readyRef (nil), ready 0, wobj 0x7fe1c40b9a50, sendp (nil) status 0flags 0x26138606, flags-2 0x50, usrFlags 0x0 } 2021-05-07 11:16:47.861 :GIPCGMOD:3788908288: gipcmodGipcDisconnect: [gipc] Issued endpoint close for endp 0x7fe1c407ed10 [00000000000029b7] { gipcEndpoint : localAddr 'gipcha://x7dbanbob01:gipcdha_x7dbanbob01_/2ac7-8166-0b84-0143', remoteAddr 'gipcha://x7dbanbob04:cabb-545e-6653-73e8', numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 37134, readyRef (nil), ready 0, wobj 0x7fe1c40b9a50, sendp (nil) status 0flags 0x26138606, flags-2 0x50, usrFlags 0x0 } 2021-05-07 11:16:47.861 :GIPCHDEM:3778402048: gipchaDaemonConnect: connecting to daemon addr ipc://gipcd_x7dbanbob01 2021-05-07 11:16:47.861 :GIPCHGEN:3778402048: gipchaNodeMarkInfAsTransientF [gipchaDaemonCheckInterfaces : gipchaDaemonThread.c : 5956]: marking infs of node 0x7fe1d01320b0 { host '', haName 'gipcd_ha_name', srcLuid f3e864a7-00000000, dstLuid 00000000-00000000 numInf 2, sentRegister 0, localMonitor 0, baseStream 0x7fe1d00bf590 type gipchaNodeType12001 (20), nodeIncarnation 00000000-00000000, incarnation 0, cssIncarnation 0, negDigest 4294967295, roundTripTime 4294967295 lastSeenPingAck 0 nextPingId 1 latencySrc 0 latencyDst 0 flags 0x200001} as TRANSIENT 2021-05-07 11:16:47.861 : GIPCTLS:3788908288: gipcmodTlsDisconnect: [tls] disconnect issued on endp 0x7fe1d00a80f0 [00000000000002a9] { gipcEndpoint : localAddr 'gipcha://x7dbanbob01:gipcdha_x7dbanbob01_', remoteAddr '', numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x7fe1d00bf4d0, sendp (nil) status 0flags 0x26038607, flags-2 0x90, usrFlags 0x0 } 2021-05-07 11:16:47.861 :GIPCGMOD:3788908288: gipcmodGipcDisconnect: [gipc] Issued endpoint close for endp 0x7fe1d00a80f0 [00000000000002a9] { gipcEndpoint : localAddr 'gipcha://x7dbanbob01:gipcdha_x7dbanbob01_', remoteAddr '', numPend 1, numReady 0, numDone 0, numDead 0, numTransfer 0, objFlags 0x0, pidPeer 0, readyRef (nil), ready 0, wobj 0x7fe1d00bf4d0, sendp (nil) status 0flags 0x26038607, flags-2 0x90, usrFlags 0x0 } 2021-05-07 11:16:47.861 : GIPCD:3788908288: gipcdSetThreadState: changing the status of nodeThread. current status gipcdThreadStatusOnline desired status gipcdThreadStatusOffline 2021-05-07 11:16:47.861 :GIPCDNDE:3788908288: gipcdNodeThread: Node thread has exited 2021-05-07 11:16:47.864 :GIPCHGEN:3778402048: gipchaNodeAddInterfaceF: recovered TRANSIENT inf 0x7fe1b82a69e0 { host '', haName 'gipcd_ha_name', local (nil), ip '192.168.10.1:62173', subnet '192.168.8.0', mask '255.255.252.0', mac '80-00-02-08-fe-80-00-00-00-00-00-00-00-10-e0-00-01-d1-e7-59', ifname 'ib0', numRef 3, numFail 0, idxBoot 0, flags 0x184d } 2021-05-07 11:16:47.865 :GIPCHGEN:3778402048: gipchaNodeAddInterfaceF: recovered TRANSIENT inf 0x7fe1b82a6f60 { host '', haName 'gipcd_ha_name', local (nil), ip '192.168.10.2:34923', subnet '192.168.8.0', mask '255.255.252.0', mac '80-00-02-09-fe-80-00-00-00-00-00-00-00-10-e0-00-01-d1-e7-5a', ifname 'ib1', numRef 3, numFail 0, idxBoot 0, flags 0x184d } 2021-05-07 11:16:47.871 : GIPCD:4062002176: gipcdMain: All threads terminated 2021-05-07 11:16:47.871 : GIPCD:4062002176: gipcdMain: GIPCD terminated
解决方法
shutdown GI with “crsctl stop crs -f”, as root user:
# rm -rf /usr/tmp/.oracle/* /var/tmp/.oracle/* /tmp/.oracle/*
start GI
# crsctl start crs -wait
对不起,这篇文章暂时关闭评论。