Oracle 19c RAC 频繁重启 OS log show “avahi-daemon : Withdrawing address record”
总会有一些创新型的客户走在技术的最前端,但有些问题无参考这是最担忧的问题,最近就一个非常新的环境ORACLE 19C 2-nodes RAC on IBM LinuxONE大机,同一大机部分节点上oracle实例频繁重启,重启前OS日志中有输出“avahi-daemon[4537]: Withdrawing address record for 28.83.70.4 on bond0.3112”, 这里简单记录这个案例。
环境信息(来源DB ALERT LOG)
NOTE: remote asm mode is remote (mode 0x2; from cluster type) NOTE: Cluster configuration type = CLUSTER [4] Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.4.0.0.0. ORACLE_HOME: /u01/app/oracle/product/19.0.0/dbhome_1 System name: Linux Node name: anboblpa4 Release: 3.10.0-693.el7.s390x Version: #1 SMP Thu Jul 6 20:01:53 EDT 2017 Machine: s390x ... Dynamic remastering is disabled List of instances (total 2) : 1 2 My inst 2 (I'm a new instance) ... Ping without log force is disabled: not an Exadata system. Buffer Cache Full DB Caching mode changing from FULL CACHING DISABLED to FULL CACHING ENABLED Picked broadcast on commit scheme to generate SCNs Endian type of dictionary set to big 2019-10-11T14:07:47.939549+08:00 Redo log for group 3, sequence 1 is not located on DAX storage ... =========================================================== Dumping current patch information =========================================================== Patch Id: 29585399 Patch Description: OCW RELEASE UPDATE 19.3.0.0.0 (29585399) Patch Apply Time: 2019-06-02T04:42:42+08:00 ... Patch Id: 29834717 Patch Description: Database Release Update : 19.4.0.0.190716 (29834717) Patch Apply Time: 2019-10-09T13:50:10+08:00 ... Patch Id: 29774421 Patch Description: OJVM RELEASE UPDATE: 19.4.0.0.190716 (29774421) Patch Apply Time: 2019-10-09T13:51:42+08:00
数据库错误日志 db alert log
2020-01-04T10:04:31.601196+08:00
Thread 2 advanced to log sequence 73905 (LGWR switch)
Current log# 24 seq# 73905 mem# 0: +DATADG/A2CDB/ONLINELOG/group_24.405.1028475833
Current log# 24 seq# 73905 mem# 1: +DATADG/A2CDB/ONLINELOG/group_24.392.1028475843
2020-01-04T10:04:31.747748+08:00
ARC1 (PID:9942): Archived Log entry 38639 added for T-2.S-73904 ID 0xffffffff9876a713 LAD:1
2020-01-04T10:17:28.893387+08:00
Thread 2 advanced to log sequence 73906 (LGWR switch)
Current log# 25 seq# 73906 mem# 0: +DATADG/A2CDB/ONLINELOG/group_25.404.1028475833
Current log# 25 seq# 73906 mem# 1: +DATADG/A2CDB/ONLINELOG/group_25.388.1028475843
2020-01-04T10:17:29.016138+08:00
ARC2 (PID:9944): Archived Log entry 38643 added for T-2.S-73905 ID 0xffffffff9876a713 LAD:1
2020-01-04T10:29:23.921516+08:00
PMON (ospid: ): terminating the instance due to ORA error
2020-01-04T10:29:23.921625+08:00
Cause - 'Instance is being terminated due to fatal process death (pid: 5, ospid: 8726, IPC0)'
2020-01-04T10:29:25.777379+08:00
Instance terminated by PMON, pid = 8718
2020-01-04T10:32:01.383411+08:00
Starting ORACLE instance (normal) (OS id: 8531)
....
2020-01-04T10:32:30.496195+08:00
CJQ0 started with pid=96, OS id=10437
Completed: ALTER DATABASE OPEN /* db agent *//* {2:17390:2} */
CRSD.LOG
## crsd.log 2020-01-04 10:27:31.168 :GIPCHTHR:2329880848: gipchaWorkerWork: workerThread heart beat, time interval since last heartBeat 30041 loopCount 60 sendCount 12 recvCount 36 postCount 12 sendCmplCount 12 recvCmplCount 12 2020-01-04 10:27:32.269 :GIPCHTHR:2327783696: gipchaDaemonWork: DaemonThread heart beat, time interval since last heartBeat 30041loopCount 47 2020-01-04 10:28:01.229 :GIPCHTHR:2329880848: gipchaWorkerWork: workerThread heart beat, time interval since last heartBeat 30061 loopCount 62 sendCount 12 recvCount 36 postCount 12 sendCmplCount 12 recvCmplCount 12 2020-01-04 10:28:02.331 :GIPCHTHR:2327783696: gipchaDaemonWork: DaemonThread heart beat, time interval since last heartBeat 30062loopCount 46 2020-01-04 10:28:31.273 :GIPCHTHR:2329880848: gipchaWorkerWork: workerThread heart beat, time interval since last heartBeat 30044 loopCount 67 sendCount 14 recvCount 42 postCount 14 sendCmplCount 14 recvCmplCount 14 2020-01-04 10:28:32.374 :GIPCHTHR:2327783696: gipchaDaemonWork: DaemonThread heart beat, time interval since last heartBeat 30043loopCount 46 2020-01-04 10:29:01.315 :GIPCHTHR:2329880848: gipchaWorkerWork: workerThread heart beat, time interval since last heartBeat 30042 loopCount 60 sendCount 12 recvCount 36 postCount 12 sendCmplCount 12 recvCmplCount 12 2020-01-04 10:29:02.415 :GIPCHTHR:2327783696: gipchaDaemonWork: DaemonThread heart beat, time interval since last heartBeat 30041loopCount 46 2020-01-04 10:29:19.978 :UiServer:749717776: [ INFO] {2:17417:6560} Container [ Name: FENCESERVER API_HDR_VER: TextMessage[3] CLIENT: TextMessage[] CLIENT_NAME: TextMessage[ocssd.bin] CLIENT_PID: TextMessage[6809] CLIENT_PRIMARY_GROUP: TextMessage[asmadmin] LOCALE: TextMessage[AMERICAN_AMERICA.AL32UTF8] ] 2020-01-04 10:29:19.978 :UiServer:749717776: [ INFO] {2:17417:6560} Sending message to AGFW. ctx= 0x3fec404e010, Client PID: 6809 2020-01-04 10:29:19.978 : OCRAPI:749717776: procr_beg_asmshut: OCR ctx set to donotterminate state. Return [0]. <<<<<<<<<<<<<<<<<<< 2020-01-04 10:29:19.978 :UiServer:749717776: [ INFO] {2:17417:6560} Force-disconnecting [21] existing PE clients... 2020-01-04 10:29:19.978 :UiServer:749717776: [ INFO] {2:17417:6560} Disconnecting client of command id :28 2020-01-04 10:29:19.978 :UiServer:749717776: [ INFO] {2:17417:6560} Disconnecting client of command id :43 2020-01-04 10:29:19.978 :UiServer:749717776: [ INFO] {2:17417:6560} Disconnecting client of command id :46 ... 2020-01-04 10:29:19.979 :UiServer:749717776: [ INFO] {2:17417:6560} Disconnecting client of command id :272 2020-01-04 10:29:19.979 :UiServer:749717776: [ INFO] {2:17417:6560} Disconnecting client of command id :273 2020-01-04 10:29:19.979 :UiServer:749717776: [ INFO] {2:17417:6560} Disconnecting client of command id :275 2020-01-04 10:29:19.979 :UiServer:749717776: [ INFO] {2:17417:6560} Disconnecting client of command id :292 2020-01-04 10:29:19.979 :UiServer:749717776: [ INFO] {2:17417:6560} Sending message: 554957 to AGFW proxy server. 2020-01-04 10:29:19.979 : AGFW:768592144: [ INFO] {2:17417:6560} Agfw Proxy Server received the message: FENCE_CMD[Proxy] ID 20489:554957 2020-01-04 10:29:19.979 : AGFW:768592144: [ INFO] {2:17417:6560} Agfw Proxy Server sending message: RESOURCE_CLEAN[ora.ASMNET1LSNR_ASM.lsnr 2 1] ID 4100:554958 to the agent /u01/app/19.0.0/grid/bin/oraagent_grid 2020-01-04 10:29:19.979 : CRSPE:768592144: [ INFO] {2:17417:6560} Skipping Fence of : ora.CRSDG.dg 2020-01-04 10:29:19.979 : CRSPE:768592144: [ INFO] {2:17417:6560} Skipping Fence of : ora.DATADG.dg 2020-01-04 10:29:19.979 : CRSPE:768592144: [ INFO] {2:17417:6560} Skipping Fence of : ora.FRADG.dg 2020-01-04 10:29:19.979 : AGFW:768592144: [ INFO] {2:17417:6560} Agfw Proxy Server sending message: RESOURCE_CLEAN[ora.LISTENER.lsnr anboblpa4 1] ID 4100:554959 to the agent /u01/app/19.0.0/grid/bin/oraagent_grid 2020-01-04 10:29:19.979 : CRSPE:768592144: [ INFO] {2:17417:6560} Skipping Fence of : ora.a2cdb.db 2020-01-04 10:29:19.979 : CRSPE:768592144: [ INFO] {2:17417:6560} Skipping Fence of : ora.asm 2020-01-04 10:29:19.979 : AGFW:768592144: [ INFO] {2:17417:6560} Agfw Proxy Server sending message: RESOURCE_CLEAN[ora.asmnet1.asmnetwork 2 1] ID 4100:554960 to the agent /u01/app/19.0.0/grid/bin/orarootagent_root 2020-01-04 10:29:19.979 : AGFW:768592144: [ INFO] {2:17417:6560} Agfw Proxy Server sending message: RESOURCE_CLEAN[ora.net1.network anboblpa4 1] ID 4100:554961 to the agent /u01/app/19.0.0/grid/bin/orarootagent_root 2020-01-04 10:29:19.979 : AGFW:768592144: [ INFO] {2:17417:6560} Agfw Proxy Server sending message: RESOURCE_CLEAN[ora.ons anboblpa4 1] ID 4100:554962 to the agent /u01/app/19.0.0/grid/bin/oraagent_grid 2020-01-04 10:29:19.979 : AGFW:768592144: [ INFO] {2:17417:6560} Agfw Proxy Server sending message: RESOURCE_CLEAN[ora.anboblpa4.vip 1 1] ID 4100:554963 to the agent /u01/app/19.0.0/grid/bin/orarootagent_root 2020-01-04 10:29:19.979 : AGFW:768592144: [ INFO] {2:17417:6560} dumpPendingFences:Started Resource Fencing, Count = 6 2020-01-04 10:29:19.979 : AGFW:768592144: [ INFO] {2:17417:6560} ora.ASMNET1LSNR_ASM.lsnr 2 1;ora.LISTENER.lsnr anboblpa4 1;ora.asmnet1.asmnetwork 2 1;ora.net1.network anboblpa4 1;ora.ons anboblpa4 1;ora.anboblpa4.vip 1 1;--- 2020-01-04 10:29:19.981 : AGFW:768592144: [ INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.asmnet1.asmnetwork 2 1] ID 4100:554960 from the agent /u01/app/19.0.0/grid/bin/orarootagent_root 2020-01-04 10:29:19.981 : AGFW:768592144: [ INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.asmnet1.asmnetwork 2 1] ID 4100:554960 from the agent /u01/app/19.0.0/grid/bin/orarootagent_root 2020-01-04 10:29:19.981 : AGFW:768592144: [ INFO] {2:17417:6560} Fenced off the resource [ora.asmnet1.asmnetwork] 2020-01-04 10:29:19.981 : AGFW:768592144: [ INFO] {2:17417:6560} dumpPendingFences:Pending Fence(s), Count = 5 2020-01-04 10:29:19.981 : AGFW:768592144: [ INFO] {2:17417:6560} ora.ASMNET1LSNR_ASM.lsnr 2 1;ora.LISTENER.lsnr anboblpa4 1;ora.net1.network anboblpa4 1;ora.ons anboblpa4 1;ora.anboblpa4.vip 1 1;--- 2020-01-04 10:29:19.982 : AGFW:768592144: [ INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.net1.network anboblpa4 1] ID 4100:554961 from the agent /u01/app/19.0.0/grid/bin/orarootagent_root 2020-01-04 10:29:19.982 : AGFW:768592144: [ INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.net1.network anboblpa4 1] ID 4100:554961 from the agent /u01/app/19.0.0/grid/bin/orarootagent_root 2020-01-04 10:29:19.982 : AGFW:768592144: [ INFO] {2:17417:6560} Fenced off the resource [ora.net1.network] 2020-01-04 10:29:19.982 : AGFW:768592144: [ INFO] {2:17417:6560} dumpPendingFences:Pending Fence(s), Count = 4 2020-01-04 10:29:19.982 : AGFW:768592144: [ INFO] {2:17417:6560} ora.ASMNET1LSNR_ASM.lsnr 2 1;ora.LISTENER.lsnr anboblpa4 1;ora.ons anboblpa4 1;ora.anboblpa4.vip 1 1;--- 2020-01-04 10:29:19.984 : AGFW:768592144: [ INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.ASMNET1LSNR_ASM.lsnr 2 1] ID 4100:554958 from the agent /u01/app/19.0.0/grid/bin/oraagent_grid 2020-01-04 10:29:19.984 : AGFW:768592144: [ INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.LISTENER.lsnr anboblpa4 1] ID 4100:554959 from the agent /u01/app/19.0.0/grid/bin/oraagent_grid 2020-01-04 10:29:20.089 : AGFW:768592144: [ INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.anboblpa4.vip 1 1] ID 4100:554963 from the agent /u01/app/19.0.0/grid/bin/orarootagent_root 2020-01-04 10:29:20.089 : AGFW:768592144: [ INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.anboblpa4.vip 1 1] ID 4100:554963 from the agent /u01/app/19.0.0/grid/bin/orarootagent_root 2020-01-04 10:29:20.089 : AGFW:768592144: [ INFO] {2:17417:6560} Fenced off the resource [ora.anboblpa4.vip] <<<<<<<<<<<<<<<<<<< 2020-01-04 10:29:20.089 : AGFW:768592144: [ INFO] {2:17417:6560} dumpPendingFences:Pending Fence(s), Count = 3 2020-01-04 10:29:20.089 : AGFW:768592144: [ INFO] {2:17417:6560} ora.ASMNET1LSNR_ASM.lsnr 2 1;ora.LISTENER.lsnr anboblpa4 1;ora.ons anboblpa4 1;--- ... 2020-01-04 10:29:21.640 : OCRSRV:805304592: proas_amiwriter: ctx is in some other state 2020-01-04 10:29:21.687 : AGFW:768592144: [ INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.LISTENER.lsnr anboblpa4 1] ID 4100:554959 from the agent /u01/app/19.0.0/grid/bin/oraagent_grid 2020-01-04 10:29:21.687 : AGFW:768592144: [ INFO] {2:17417:6560} Fenced off the resource [ora.LISTENER.lsnr] 2020-01-04 10:29:21.687 : AGFW:768592144: [ INFO] {2:17417:6560} dumpPendingFences:Pending Fence(s), Count = 1 2020-01-04 10:29:21.687 : AGFW:768592144: [ INFO] {2:17417:6560} ora.ASMNET1LSNR_ASM.lsnr 2 1;--- 2020-01-04 10:29:21.687 : AGFW:768592144: [ INFO] {2:17417:6560} Received the reply to the message: RESOURCE_CLEAN[ora.ASMNET1LSNR_ASM.lsnr 2 1] ID 4100:554958 from the agent /u01/app/19.0.0/grid/bin/oraagent_grid 2020-01-04 10:29:21.687 : AGFW:768592144: [ INFO] {2:17417:6560} Fenced off the resource [ora.ASMNET1LSNR_ASM.lsnr] 2020-01-04 10:29:21.687 : AGFW:768592144: [ INFO] {2:17417:6560} dumpPendingFences:Pending Fence(s), Count = 0 2020-01-04 10:29:21.688 : CRSPE:768592144: [ INFO] {2:17417:6560} Fence command status: UI_DATA 2020-01-04 10:29:21.688 : AGFW:768592144: [ INFO] {2:17417:6560} Fence command completed, rc = 0 2020-01-04 10:29:21.688 :UiServer:749717776: [ INFO] {2:17417:6560} UI server recvd reply from Agfw Proxy Server: 116530 2020-01-04 10:29:21.688 :UiServer:749717776: [ INFO] {2:17417:6560} Response: c1|7!UI_DATAk6|RESULTt1|0 2020-01-04 10:29:21.688 :UiServer:749717776: [ INFO] {2:17417:6560} Done for ctx=0x3fec404e010 2020-01-04 10:29:21.690 : OCRSRV:805304592: proas_amiwriter: ctx is in some other state Trace file /u01/app/grid/diag/crs/anboblpa4/crs/trace/crsd.trc Oracle Database 19c Clusterware Release 19.0.0.0.0 - Production Version 19.4.0.0.0 Copyright 1996, 2019 Oracle. All rights reserved. default:2381237440: 1: clskec:has:CLSU:910 4 args[clsdAdr_CLSK_err][mod=clsdadr.c][loc=(:CLSD00302:)][msg=2020-01-04 10:29:21.861 (:CLSD00302:) Trace file size and number of segments fetched from environment variable: ORA_DAEMON_TRACE_FILE_OPTIONS filesize=26214400,numsegments=10 Detected in function clsdAdrGetEnvVar_TFO at line number 6819. ] ... CLSB:2652294336: [ INFO] Argument count (argc) for this daemon is 2 CLSB:2652294336: [ INFO] Argument 0 is: /u01/app/19.0.0/grid/bin/crsd.bin CLSB:2652294336: [ INFO] Argument 1 is: reboot 2020-01-04 10:31:18.928 : CRSMAIN:2652294336: [ INFO] First attempt: init CSS context succeeded. 2020-01-04 10:31:18.928 : CRSMAIN:2652294336: [ INFO] Start mode: normal 2020-01-04 10:31:18.930 : CLSDMT:2146957584: [ INFO] PID for the Process [7535], connkey CRSD 2020-01-04 10:31:19.305 : CRSMAIN:2652294336: [ INFO] CRS Daemon Starting
ocssd.log
2020-01-04 10:29:15.128 : CSSD:2514209040: [ INFO] : Sending member data change to GMP for group HB+ASM, memberID 17:2:2 2020-01-04 10:29:15.128 : CSSD:2526538000: [ INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 17:2:2, datatype 1 datasize 4 2020-01-04 10:29:15.129 : CSSD:2511063312: [ INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 17:2:2 from clientID 2:49:2 2020-01-04 10:29:16.683 : CSSD:2520246544: [ INFO] clssgmpcGMCReqWorkerThread: processing msg (0x3ff74043d70) type 2, msg size 76, payload (0x3ff74043d9c) size 32, sequence 1505227, for clientID 2:49:2 2020-01-04 10:29:17.178 : CSSD:2514209040: [ INFO] : Processing member data change type 1, size 4 for group HB+ASM, memberID 17:2:2 2020-01-04 10:29:17.178 : CSSD:2514209040: [ INFO] : Sending member data change to GMP for group HB+ASM, memberID 17:2:2 2020-01-04 10:29:17.178 : CSSD:2526538000: [ INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 17:2:2, datatype 1 datasize 4 2020-01-04 10:29:17.179 : CSSD:2511063312: [ INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 17:2:2 from clientID 2:49:2 2020-01-04 10:29:18.229 : CSSD:2511063312: [ INFO] clssgmcpGroupDataResp: sending type 5, size 166, status 0 to clientID 2:24:0 2020-01-04 10:29:18.229 : CSSD:2514209040: [ INFO] clssgmMemberPublicInfo: group DG_CRSDG member 0 not found 2020-01-04 10:29:18.229 : CSSD:2517100816: [ INFO] clssgmpcGMCReqWorkerThread: processing msg (0x3ff7403e390) type 2, msg size 85, payload (0x3ff7403e3bc) size 41, sequence 1505230, for clientID 2:24:0 2020-01-04 10:29:18.552 : CSSD:839383312: [ INFO] clssnmSendingThread: sending status msg to all nodes 2020-01-04 10:29:18.552 : CSSD:839383312: [ INFO] clssnmSendingThread: sent 4 status msgs to all nodes 2020-01-04 10:29:19.238 : CSSD:2514209040: [ INFO] : Processing member data change type 1, size 4 for group HB+ASM, memberID 17:2:2 2020-01-04 10:29:19.238 : CSSD:2514209040: [ INFO] : Sending member data change to GMP for group HB+ASM, memberID 17:2:2 2020-01-04 10:29:19.238 : CSSD:2526538000: [ INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 17:2:2, datatype 1 datasize 4 2020-01-04 10:29:19.239 : CSSD:2511063312: [ INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 17:2:2 from clientID 2:49:2 2020-01-04 10:29:19.908 : CSSD:2514209040: [ INFO] : Processing member data change type 1, size 600 for group GR+DB_A2CDB, memberID 119:2:1 2020-01-04 10:29:19.908 : CSSD:2514209040: [ INFO] : Sending member data change to GMP for group GR+DB_A2CDB, memberID 119:2:1 2020-01-04 10:29:19.908 : CSSD:2526538000: [ INFO] clssgmpcMemberDataUpdt: grockName GR+DB_A2CDB memberID 119:2:1, datatype 1 datasize 600 2020-01-04 10:29:19.909 : CSSD:2511063312: [ INFO] clssgmSendEventsToMbrs: Group GR+DB_A2CDB, member count 1, event master 0, event type 6, event incarn 21350, event member count 1, pids 8779-13034, 2020-01-04 10:29:19.909 : CSSD:2511063312: [ INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 119:2:1 from clientID 2:87:2 2020-01-04 10:29:19.909 : CSSD:2514209040: [ INFO] clssgmTermMember: Terminating memberID 119:2:1 (0x3ff400783a0) in grock GR+DB_A2CDB <<<<<<<<<<<<<<<<<<< 2020-01-04 10:29:19.909 : CSSD:2514209040: ASSERT clsssc.c 8607 2020-01-04 10:29:19.909 : CSSD:2514209040: clssscRefFree: object(0x3ff4842b670) has 0 reference prior to decrement, object may have been deallocated! <<<<<<<<<<<<<<<<<<< 2020-01-04 10:29:19.909 : CSSD:2514209040: [ INFO] clssnmCheckForNetworkFailure: Entered 2020-01-04 10:29:19.909 : CSSD:2514209040: [ INFO] clssnmCheckForNetworkFailure: skipping 0 defined 0 2020-01-04 10:29:19.909 : CSSD:2514209040: [ INFO] clssnmCheckForNetworkFailure: expiring 0 evicted 0 evicting node 0 this node 1 2020-01-04 10:29:19.909 : CSSD:2514209040: [ INFO] clssnmCheckForNetworkFailure: expiring 0 evicted 0 evicting node 0 this node 2 2020-01-04 10:29:19.909 : CSSD:2514209040: [ INFO] clssnmCheckForNetworkFailure: skipping 3 defined 0 2020-01-04 10:29:19.909 : CSSD:2514209040: [ INFO] clssnmCheckForNetworkFailure: skipping 4 defined 0 ... 2020-01-04 10:29:19.909 : CSSD:2514209040: [ INFO] clssnmCheckForNetworkFailure: skipping 30 defined 0 2020-01-04 10:29:19.909 : CSSD:2514209040: [ INFO] clssnmCheckForNetworkFailure: skipping 31 defined 0 2020-01-04 10:29:19.909 : CSSD:2514209040: [ INFO] clssscExit: Call to clscal flush successful and clearing the CLSSSCCTX_INIT_CALOG flag so that no further CA logging happens 2020-01-04 10:29:19.909 : CSSD:2514209040: [ INFO] clssnmRemoveNodeInTerm: node 2, anboblpa4 terminated. Removing from its own member and connected bitmaps 2020-01-04 10:29:19.909 : CSSD:2514209040: [ ERROR] ################################### 2020-01-04 10:29:19.909 : CSSD:2514209040: [ ERROR] clssscExit: CSSD aborting from thread GMClientListener 2020-01-04 10:29:19.909 : CSSD:2514209040: [ ERROR] ################################### 2020-01-04 10:29:19.909 : CSSD:2514209040: [ INFO] (:CSSSC00012:)clssscExit: A fatal error occurred and the CSS daemon is terminating abnormally 2020-01-04 10:29:19.909 : CSSD:2514209040: [ INFO] ####### Begin Diagnostic Dump ####### 2020-01-04 10:29:19.910 : CSSD:2514209040: [ INFO] ### Begin diagnostic data for the Core layer ### 2020-01-04 10:29:19.910 : CSSD:2514209040: [ INFO] Initialization successfully completed OK 2020-01-04 10:29:19.910 : CSSD:2514209040: [ INFO] Initialization of EXADATA fencing successfully completed OK 2020-01-04 10:29:19.910 : CSSD:2514209040: [ INFO] #### End diagnostic data for the Core layer #### 2020-01-04 10:29:19.910 : CSSD:2514209040: [ INFO] ### Begin diagnostic data for the GM Client layer ### 2020-01-04 10:29:19.910 : CSSD:2514209040: Status for clientID 2:11:2, pid(6609-5118), GIPC endpt 0x261e, flags 0x0004, refcount 3, aborted at 0, fence is not progress OK 2020-01-04 10:29:19.910 : CSSD:2514209040: [ INFO] memberID 5:2:2, group EVMDMAIN2 refcount 3, state 0x0000, granted 0, fence is not in progress OK 2020-01-04 10:29:19.910 : CSSD:2514209040: Status for clientID 2:13:1, pid(6758-5541), GIPC endpt 0x3ebd, flags 0x0004, refcount 3, aborted at 0, fence is not progress OK 2020-01-04 10:29:19.910 : CSSD:2514209040: [ INFO] memberID 11:2:0, group CRF- refcount 3, state 0x0000, granted 0, fence is not in progress OK ... 2020-01-04 10:29:19.910 : CSSD:2514209040: [ INFO] memberID 26:2:2, group ocr_dblpa4-cluster refcount 3, state 0x0000, granted 0, fence is not in progress OK 2020-01-04 10:29:19.910 : CSSD:2514209040: Status for clientID 2:29:4, pid(7488-7910), GIPC endpt 0x3a1b, flags 0x0006, refcount 3, aborted at 0, fence is not progress OK 2020-01-04 10:29:19.910 : CSSD:2514209040: [ INFO] memberID 7:2:1, group crs_version refcount 3, state 0x0000, granted 0, fence is not in progress OK ... 01-04 10:29:19.911 : CSSD:2514209040: Status for clientID 2:579:2, pid(9208-13294), GIPC endpt 0x45fbd4, flags 0x0002, refcount 2, aborted at 0, fence is not progress OK 2020-01-04 10:29:19.911 : CSSD:2514209040: Status for clientID 2:84:1, pid(8726-13006), GIPC endpt 0xbc75, flags 0x3000, refcount 3, aborted at 0, fence is not progress OK 2020-01-04 10:29:19.911 : CSSD:2514209040: [ INFO] memberID 104:2:1, group IPC0_GRP_dblpa4-cluster_a2cdb refcount 3, state 0x0000, granted 0, fence is not in progress OK 2020-01-04 10:29:19.911 : CSSD:2514209040: Status for clientID 2:85:1, pid(8742-13021), GIPC endpt 0xbe61, flags 0x2000, refcount 3, aborted at 0, fence is not progress OK ----- Call Stack Trace ----- 2020-01-04 10:29:19.913 : CSSD:2514209040: calling call entry argument values in hex 2020-01-04 10:29:19.913 : CSSD:2514209040: location type point (? means dubious value) 2020-01-04 10:29:19.913 : CSSD:2514209040: -------------------- -------- -------------------- ---------------------------- 2020-01-04 10:29:19.913 : CSSD:2514209040: ssdgetcall: Failure to recover Stack Trace: starting frame address is (nil) 2020-01-04 10:29:19.914 : CSSD:2514209040: clssscExit()+1860 call kgdsdst() 2020-01-04 10:29:19.914 : CSSD:2514209040: clssscAssert()+210 call clssscExit() 2020-01-04 10:29:19.914 : CSSD:2514209040: clssscRefFreeInt()+256 call clssscAssert() 2020-01-04 10:29:19.914 : CSSD:2514209040: clssgmTermMember()+206 call clssscRefFreeInt() 2020-01-04 10:29:19.914 : CSSD:2514209040: clssgmcClientDestroy()+266 call clssgmTermMember() 2020-01-04 10:29:19.914 : CSSD:2514209040: clssscHTDestroyObj()+334 call clssgmcClientDestroy() 2020-01-04 10:29:19.914 : CSSD:2514209040: clssscHTRefDestroyObj()+50 call clssscHTDestroyObj() 2020-01-04 10:29:19.914 : CSSD:2514209040: clssscRefFreeInt()+180 call clssscHTRefDestroyObj() 2020-01-04 10:29:19.914 : CSSD:2514209040: clssgmclienteventhndlr()+2556 call clssscRefFreeInt() 2020-01-04 10:29:19.914 : CSSD:2514209040: clssscSelect()+1568 call clssgmclienteventhndlr() 2020-01-04 10:29:19.914 : CSSD:2514209040: clssgmProcClientReqs()+2204 call clssscSelect() 2020-01-04 10:29:19.914 : CSSD:2514209040: clssgmclientlsnr()+ call clssgmProcClientReqs() 2020-01-04 10:29:19.914 : CSSD:2514209040: clssscthrdmain()+26 call clssgmclientlsnr() 2020-01-04 10:29:19.914 : CSSD:2514209040: start_thread()+234 call clssscthrdmain() ... 2020-01-04 10:29:20.054 : CSSD:840956176: [ INFO] clssnmCheckForNetworkFailure: skipping 30 defined 0 2020-01-04 10:29:20.054 : CSSD:840956176: [ INFO] clssnmCheckForNetworkFailure: skipping 31 defined 0 2020-01-04 10:29:20.054 : CSSD:840956176: [ ERROR] clssscExit: CSSD aborting from thread clssnmPollingThread 2020-01-04 10:29:20.054 : CSSD:840956176: [ INFO] clssscExit: abort already set 1 ... 2020-01-04 10:29:21.057 : CSSD:2520246544: [ INFO] clssgmRPC: RPC(#1) to node(8298) not sent due to impending local GM shutdown 2020-01-04 10:29:21.057 : CSSD:2520246544: [ INFO] clssgmRPC: failed to send RPC#8298 to node(1), rpcret(10), master(1), DBInfo(1), masterRPC(1),unsentRPC(0), queuing RPC to unsent queue 2020-01-04 10:29:21.057 : CSSD:2520246544: [ INFO] clssgmDelMemCmpl: rpc 0x3ff96c83f90, ret 10, clientID 2:34:10 memberID 7:2:4 2020-01-04 10:29:21.058 : CSSD:2511063312: [ INFO] clssgmcpMbrDeleteResp: Status -16 deleting memberID 7:2:4 from clientID 2:34:10 2020-01-04 10:29:21.288 : CSSD:2751404192: [ INFO] : Processing member data change type 1, size 4 for group HB+ASM, memberID 17:2:2 2020-01-04 10:29:21.288 : CSSD:2751404192: [ INFO] : Sending member data change to GMP for group HB+ASM, memberID 17:2:2 2020-01-04 10:29:21.288 : CSSD:2526538000: [ INFO] clssgmpcMemberDataUpdt: grockName HB+ASM memberID 17:2:2, datatype 1 datasize 4 2020-01-04 10:29:21.288 : CSSD:2526538000: [ INFO] clssgmpeersend: Local node is terminating, send aborted for node dx2dblpa4, number 1 2020-01-04 10:29:21.288 : CSSD:2526538000: [ WARNING] clssgmBroadcastMap: clssgmpeersend node(1) failed - 0 2020-01-04 10:29:21.288 : CSSD:2511063312: [ INFO] clssgmcpDataUpdtCmpl: Status 0 mbr data updt memberID 17:2:2 from clientID 2:49:2 2020-01-04 10:29:21.688 : CSSD:2514209040: [ INFO] clssscExit: CRSD cleanup status 0 2020-01-04 10:29:21.690 : CSSD:2514209040: [ INFO] clssscagProcAgReq: Sending response that CSSD is shutting down reason 0x0 2020-01-04 10:29:21.690 : CSSD:2514209040: [ INFO] clssscExit: CRSD cleanup successfully completed 2020-01-04 10:29:21.691 : CSSD:2514209040: [ INFO] clssgmKillAllClients: Adding pid 7488-7910 to the kill request 2020-01-04 10:29:21.691 : CSSD:2514209040: [ INFO] clssgmKillAllClients: Adding pid 7812-8904 to the kill request
ohasd_cssdagent_root
2020-01-04 10:29:33.535 [CSSDAGENT(6791)]CRS-1661: The CSS daemon is not responding. If this persists, a reboot will occur in 13960 milliseconds
OS messages LOG
Jan 4 10:29:17 anboblpa4 systemd-logind: New session 43767 of user grid.
Jan 4 10:29:17 anboblpa4 systemd: Started Session 43767 of user grid.
Jan 4 10:29:17 anboblpa4 systemd: Starting Session 43767 of user grid.
Jan 4 10:29:18 anboblpa4 systemd-logind: Removed session 43767.
Jan 4 10:29:19 anboblpa4 avahi-daemon[4537]: Withdrawing address record for 28.83.70.4 on bond0.3112.
Jan 4 10:27:14 anboblpa4 journal: Runtime journal is using 8.0M (max allowed 3.1G, trying to leave 4.0G free of 31.4G available → current limit 3.1G).
Jan 4 10:27:14 anboblpa4 kernel: Initializing cgroup subsys cpuset
Jan 4 10:27:14 anboblpa4 kernel: Initializing cgroup subsys cpu
Jan 4 10:27:14 anboblpa4 kernel: Initializing cgroup subsys cpuacct
Jan 4 10:27:14 anboblpa4 kernel: Linux version 3.10.0-693.el7.s390x (mockbuild@s390-014.build.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC) ) #1 SMP Thu Jul 6 20:01:53 EDT 2017
Jan 4 10:27:14 anboblpa4 kernel: setup: Linux is running natively in 64-bit mode
OSW
-- VMSTAT zzz ***Sat Jan 4 10:29:09 CST 2020 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 3 0 0 12239196 46764 40267704 0 0 432 93 2 3 2 1 97 0 0 1 0 0 12227552 46764 40268336 0 0 136 292 10503 16484 3 3 94 0 0 0 0 0 12229828 46764 40267980 0 0 8368 156 10164 17217 2 1 97 0 0 zzz ***Sat Jan 4 10:29:39 CST 2020 procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 3 0 0 53626732 44196 9418284 0 0 432 93 2 3 2 1 97 0 0 0 0 0 53624900 44196 9418216 0 0 36 779 4824 8191 2 2 97 0 0 0 0 0 53625828 44196 9418264 0 0 36 311 3262 5288 0 0 99 0 0 -- TOP zzz ***Sat Jan 4 10:29:39 CST 2020 top - 10:29:41 up 16 days, 1:33, 1 user, load average: 0.59, 0.80, 0.93 Tasks: 277 total, 1 running, 276 sleeping, 0 stopped, 0 zombie %Cpu(s): 1.0 us, 0.9 sy, 0.0 ni, 97.9 id, 0.1 wa, 0.0 hi, 0.0 si, 0.1 st KiB Mem : 65946540 total, 53624596 free, 2859464 used, 9462480 buff/cache KiB Swap: 33550332 total, 33550332 free, 0 used. 61802316 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 4578 node_ex+ 20 0 117028 13180 5924 S 3.0 0.0 55:01.24 node_expor+ 6758 root rt 0 1530652 189620 105952 S 2.0 0.3 309:12.92 osysmond.b+ 6791 root rt 0 1131744 153004 107768 S 2.0 0.2 15:29.29 cssdagent 7648 root rt -5 1114976 252424 104644 S 2.0 0.4 96:31.80 ologgerd 6809 grid rt 0 3047524 261744 112608 S 1.0 0.4 81:24.56 ocssd.bin zzz ***Sat Jan 4 10:32:41 CST 2020 top - 10:32:42 up 2 min, 1 user, load average: 3.50, 1.15, 0.42 最近有重启 Tasks: 454 total, 1 running, 452 sleeping, 0 stopped, 1 zombie %Cpu(s): 2.6 us, 2.5 sy, 0.0 ni, 92.1 id, 2.5 wa, 0.1 hi, 0.2 si, 0.0 st KiB Mem : 65946540 total, 29817876 free, 4089312 used, 32039352 buff/cache KiB Swap: 33550332 total, 33550332 free, 0 used. 31010936 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7633 root 20 0 1237724 60736 28932 S 2.0 0.1 0:00.91 orarootage+ 10619 grid 20 0 7124 2372 1440 R 2.0 0.0 0:00.02 top 5711 root 20 0 2900512 89540 28720 S 1.0 0.1 0:06.56 ohasd.bin -- IFCONFIG zzz ***Sat Jan 4 10:29:09 CST 2020 bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 1500 inet6 fe80::8006:9bff:feb4:ea26 prefixlen 64 scopeid 0x20 ether 82:06:9b:b4:ea:26 txqueuelen 1000 (Ethernet) RX packets 209436160 bytes 143740137716 (133.8 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 121960004 bytes 46934313850 (43.7 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 bond0.3112: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 28.83.70.2 netmask 255.255.255.0 broadcast 28.83.70.255 inet6 fe80::8006:9bff:feb4:ea26 prefixlen 64 scopeid 0x20 ether 82:06:9b:b4:ea:26 txqueuelen 1000 (Ethernet) RX packets 122078240 bytes 117461299874 (109.3 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 87350042 bytes 36732906875 (34.2 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 bond0.3112:1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 28.83.70.4 netmask 255.255.255.0 broadcast 28.83.70.255 《《下一次采集IP 不存在了 ether 82:06:9b:b4:ea:26 txqueuelen 1000 (Ethernet) zzz ***Sat Jan 4 10:29:39 CST 2020 bond0: flags=5187<UP,BROADCAST,RUNNING,MASTER,MULTICAST> mtu 1500 inet6 fe80::8006:9bff:feb4:ea26 prefixlen 64 scopeid 0x20 ether 82:06:9b:b4:ea:26 txqueuelen 1000 (Ethernet) RX packets 209438026 bytes 143740812908 (133.8 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 121961571 bytes 46935014086 (43.7 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 bond0.3112: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 28.83.70.2 netmask 255.255.255.0 broadcast 28.83.70.255 inet6 fe80::8006:9bff:feb4:ea26 prefixlen 64 scopeid 0x20 ether 82:06:9b:b4:ea:26 txqueuelen 1000 (Ethernet) RX packets 122080043 bytes 117461945938 (109.3 GiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 87351401 bytes 36733594709 (34.2 GiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
建议收集日志
— note: 最好提供所有节点
— OS
/var/log/messages
OSW 当问题时间点日志,现在的系统越来越快时间线上的先后顺序带来困难, 建议intarval 3-5s 使用gzip
— GI
su – grid
cd $ORACLE_BASE/diag/crs/*/crs/trace
ls -l alert log
ls -l crsd.trc
ls -l cssd.trc
ls -l ohasd_cssdagent_root.trc
— DB
su – oracle
cd $ORACLE_BASE/diag/rdbms/*/$ORACLE_SID/trace
ls -l alert*.log
ls -l *dbw*.trc
ls -l *lmhb*.trc
crash相关进程的trace
Avahi(avahi守护程序)
Avahi(avahi守护程序)相关于免费实现Apple的“ ZeroConf”程序,并使用mDNS / DNS-SD protocol套件进行服务发现。 并在网络中扫描打印机和其他共享资源 。 它还可以提供DNS和DHCP服务。 这对于笔记本电脑非常有用,但在服务器上并不需要,也可能导致网络性能下降,并在重负载下变得不稳定。Avahi对ORACLE没有任何益处。建议在ORACLE环境禁用该服务,Oracle已报告avahi-daemon可能会干扰Oracle RAC的多播心跳信号,从而导致应用程序层接口假定它已在节点上断开连接并重新启动。
avahi deamon can also spin or accumulate high CPU time
Bug# 14027941 Possible rac eviction avahi-daemon: withdrawing address record
Bug# 14739888 – CSSD FROM NODE 2 CANNOT JOIN THE CLUSTER AFTER REBOOT
Bug# 12717666 : LNX64-11203 CVU TO CHECK IF AVAHI-DAEMON IS DISABLED AND PROVIDEFIX-UP SCRIPT
OS messages中记录的
anboblpa4 avahi-daemon[4537]: Withdrawing address record for 28.83.70.4 on bond0.3112
网上有记录上面的日志显示 Withdrawing address,只是avahi-daemon发现了是IP被回收,但不是它发起的。
禁用AVAHI
To stop the avahi-daemon, for OL5/OL6:
# service avahi-dnsconfd stop # ignore any errors # service avahi-daemon stop # chkconfig avahi-dnsconfd off # chkconfig avahi-daemon off
for OL7:
# systemctl stop avahi-daemon.socket avahi-daemon.service # systemctl disable avahi-daemon.socket avahi-daemon.service
Note: 同时注意这个案例还有个非常不好的地方,ORACLE明确指出网卡名中不要带”.”
对不起,这篇文章暂时关闭评论。