首页 » Cloud, ORACLE 9i-23ai » Troubleshooting 19c RAC CRS resource db show “UNKNOWN” state , srvctl start instance CRS-2680

Troubleshooting 19c RAC CRS resource db show “UNKNOWN” state , srvctl start instance CRS-2680

有套ORACLE 19c RAC在使用crsctl 查看db resource时显示“UNKNOWN”, 但是用sqlplus 可以启动db 实例,srvctl status instance显示not running. 手动启动instance 使用srvctl 显示如下错误

[oracle@~]$ srvctl start instance -d  -i INTS1
PRCR-1013 : Failed to start resource ora..db
PRCR-1064 : Failed to start resource ora..db on node 
CRS-2680: Clean of 'ora..db' on '' failed
CRS-5802: Unable to start the agent process

之前有做srvctl remove instance和database的操作情况依旧。

GI alert log

2020-12-05 15:52:11.687 [CRSD(7541)]CRS-2758: Resource 'ora.hmracdg.db' is in an unknown state.
2020-12-05 15:57:50.688 [CRSD(7541)]CRS-5828: Could not start agent '/u01/app/19.3.0/grid/bin/oraagent_oracle'. Details at (:CRSAGF00130:) {1:2872:4034} in /u01/app/grid/diag/crs/anbob1/crs/trace/crsd.trc.
2020-12-05 16:07:52.408 [CRSD(7541)]CRS-5828: Could not start agent '/u01/app/19.3.0/grid/bin/oraagent_oracle'. Details at (:CRSAGF00130:) {1:2872:4288} in /u01/app/grid/diag/crs/anbob1/crs/trace/crsd.trc.

crs log

2020-12-05 16:39:35.288 :   CRSPE:585066240: [     INFO] {1:2872:5072} Expression Filter : ((LAST_SERVER == anbob01) AND (NAME == ora.scan1.vip))
2020-12-05 16:39:35.291 :UiServer:578762496: [     INFO] {1:2872:5072} Done for ctx=0x7fa7e003bb10
2020-12-05 16:39:39.934 :GIPCHTHR:3034482432:  gipchaDaemonWork: DaemonThread heart beat, time interval since last heartBeat 30830loopCount 28
2020-12-05 16:40:00.294 :    CRSD:597673728: [     NONE] {1:2872:4988} {1:2872:4988} Created alert : (:CRSAGF00130:) :  Failed to start the agent /u01/app/19.3.0/grid/bin/oraagent_oracle
2020-12-05 16:40:00.294 :    AGFW:597673728: [     INFO] {1:2872:4988} Rejecting pending msgs for ora.anbob.db 1 1
2020-12-05 16:40:00.294 :    AGFW:597673728: [     INFO] {1:2872:4988} Rejecting msg: 4100
2020-12-05 16:40:00.294 :    AGFW:597673728: [     INFO] {1:2872:4988} Agfw Proxy Server sending the last reply to PE for message:RESOURCE_CLEAN[ora.anbob.db 1 1] ID 4100:11921
2020-12-05 16:40:00.294 :    AGFW:597673728: [     INFO] {1:2872:4988} Can not stop the agent: /u01/app/19.3.0/grid/bin/oraagent_oracle because pid is not initialized
2020-12-05 16:40:00.294 :   CRSPE:585066240: [     INFO] {1:2872:4988} Received reply to action [Clean] message ID: 11921
2020-12-05 16:40:00.294 :   CRSPE:585066240: [     INFO] {1:2872:4988} RI [ora.anbob.db 1 1] new internal state: [STABLE] old value: [CLEANING]
2020-12-05 16:40:00.294 :   CRSPE:585066240: [     INFO] {1:2872:4988} Fatal Error from AGFW Proxy: Unable to start the agent process
2020-12-05 16:40:00.294 :   CRSPE:585066240: [     INFO] {1:2872:4988} CRS-2680: Clean of 'ora.anbob.db' on 'anbob01' failed

2020-12-05 16:40:00.294 :   CRSPE:585066240: [     INFO] {1:2872:4988} Command [0x7fa7f446c290] has sent a progress reply:CRS-2680: Clean of 'ora.anbob.db' on 'anbob01' /
 for [ora.anbob.db]
2020-12-05 16:40:00.294 :UiServer:578762496: [     INFO] {1:2872:4988} Response: c4|5!ORDERk7|MESSAGEt57|CRS-2680: Clean of 'ora.anbob.db' on 'anbob01' failedk7|MSGTYPEt1|1k5|OBJIDt14|ora.anbob.dbk4|WAITt1|0
2020-12-05 16:40:00.295 :   CRSPE:585066240: [     INFO] {1:2872:4988} Sequencer for [ora.anbob.db 1 1] has completed with error: CRS-5802: Unable to start the agent process

2020-12-05 16:40:00.295 :   CRSPE:585066240: [     INFO] {1:2872:4988} Deleting RI-path from op-history:ora.anbob.db 1 1

oraagent启动失败,在11G 里需要检查

$ ls -ld /log//agent/crsd/oraagent_oracle
drwxrwxrwt. 2 oracle oinstall 4096 Aug 22 10:52 /log//agent/crsd/oraagent_oracle

从Grid Infrastructure版本12.1.0.2开始,每个守护进程的pid文件不仅存在于//.pid, 也在/crsdata//output/.pid. 根据MOS 2028511.1 记录,查看/tmp下新生的oraagent*.out 文件

/tmp/oragent_nnnn.out

Oracle Clusterware infrastructure error in ORAAGENT (OS PID 4976): Error in an OS-dependent function or service
Error category: -2, operation: open, location: SCLSB00009, OS error: 13
OS error message: Permission denied
Additional information: Call to open daemon stdout/stderr file failed
Oracle Clusterware infrastructure fatal error in ORAAGENT (OS PID 4976): Internal error (ID (:CLSB00126:)) - Failed to redirect daemon standard outputs using location /u01/app/grid/crsdata/anbob1/output and root name crsd_oraagent_oracle

cluvfy comp software -n all -verbose 因为只检查binary file, 未显示软件权限问题,手动检查/crsdata//output/ 下pid文件权限

发现当前目录的所有文件被chomd grid:oinstall *,和chmod 775 *, DBA对数据库应该有些敬畏之心,不要简单认为给777就ok, GRID_HOME也并非所有文件都grid owner, 当误操作时可以参考How to check and fix file permissions on Grid Infrastructure environment (Doc ID 1931142.1) 修正binary file, 然后参考正常节点修改错误节点。

pid在 GRID_HOME正常的权限如下:

-rw-r--r--. 1 root root 0 Jul 29 14:52 ./crs/init/lccn0
-rw-r--r--. 1 root root 5 Dec 1 08:35 ./crs/init/lccn0.pid
-rw-r--r--. 1 root root 0 Jul 29 14:51 ./ctss/init/lccn0
-rw-r--r--. 1 root root 5 Dec 1 08:35 ./ctss/init/lccn0.pid
-rw-r--r--. 1 grid oinstall 0 Jul 29 14:50 ./evm/init/lccn0
-rw-r--r--. 1 grid oinstall 5 Dec 1 08:34 ./evm/init/lccn0.pid
-rw-r--r--. 1 grid oinstall 5 Dec 1 08:34 ./gipc/init/lccn0
-rw-r--r--. 1 grid oinstall 5 Dec 1 08:34 ./gipc/init/lccn0.pid
-rw-r--r--. 1 grid oinstall 0 Jul 29 14:50 ./gpnp/init/lccn0
-rw-r--r--. 1 grid oinstall 5 Dec 1 08:34 ./gpnp/init/lccn0.pid
-rw-r--r--. 1 grid oinstall 0 Jul 29 14:50 ./mdns/init/lccn0
-rw-r--r--. 1 grid oinstall 5 Dec 1 08:34 ./mdns/init/lccn0.pid
-rw-r--r--. 1 root root 0 Jul 29 14:50 ./ohasd/init/lccn0
-rw-r--r--. 1 root root 5 Dec 1 08:34 ./ohasd/init/lccn0.pid
-rw-r--r--. 1 root root 0 Jul 29 14:54 ./ologgerd/init/lccn0
-rw-r--r--. 1 root root 5 Dec 1 08:35 ./ologgerd/init/lccn0.pid
-rw-r--r--. 1 root root 0 Jul 29 14:52 ./osysmond/init/lccn0
-rw-r--r--. 1 root root 5 Dec 1 08:35 ./osysmond/init/lccn0.pid

解决方法:
手动修改pid 文件权限后,重启crs 恢复正常

打赏

对不起,这篇文章暂时关闭评论。