Troubleshooting to connect DB failed with Ora-12541 due to adump trace Inode usage 100%
刚搞完一个ora-12154 又来一个ora-12541 No listener同样是个连接时问题,前端反馈使用public ip连接数据库时提示Ora-12541, 检查监听上无public IP,但是监听进程正常,同样存在service和VIP。 这里简单记录这个问题。
环境11.2.0.4 RAC on linux, PUBLIC ip 100, Vip 101
分析思路
1, 检查监听
2, 检查IP和网卡状态
3, 检查crs状态
检查监听
$lsnrctl status LSNRCTL for Linux: Version 11.2.0.4.0 - Production on 05-5月 -2022 14:26:06 Copyright (c) 1991, 2013, Oracle. All rights reserved. Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=LISTENER))) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for Linux: Version 11.2.0.4.0 - Production Start Date 08-9月 -2018 05:38:08 Uptime 341 days 3 hr. 52 min. 12 sec Trace Level off Security ON: Local OS Authentication SNMP OFF Listener Parameter File /oracle/app/11.2.0.4/grid/network/admin/listener.ora Listener Log File /oracle/app/grid/diag/tnslsnr/anbob2/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=10.*.*.101)(PORT=1521))) Services Summary... Service "+ASM" has 1 instance(s). Instance "+ASM2", status READY, has 1 handler(s) for this service... Service "anbob" has 1 instance(s). Instance "anbob2", status READY, has 1 handler(s) for this service... The command completed successfully
检查网卡
$ ip addr ... 21: bond1: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP link/ether 34:6a:c2:b9:4e:07 brd ff:ff:ff:ff:ff:ff inet 10.*.*.100/26 brd 10.*.*.127 scope global bond1 <<<<< inet 10.*.*.101/26 brd 10.*.*.127 scope global secondary bond1:1 inet 10.*.*.120/26 brd 10.*.*.127 scope global secondary bond1:3
检查CRS
grid@anbob2:/home/grid>crsctl stat res -t CRS-4535: 无法与集群就绪服务通信 CRS-4000: 命令 Status 失败, 或已完成但出现错误。 --检查crs进程,为offline的 grid@anbob2:/home/grid>crsctl stat res -t -init -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.asm 1 ONLINE ONLINE anbob2 Started ora.cluster_interconnect.haip 1 ONLINE ONLINE anbob2 ora.crf 1 ONLINE ONLINE anbob2 ora.crsd 1 ONLINE OFFLINE <<<<< ora.cssd 1 ONLINE ONLINE anbob2 ora.cssdmonitor 1 ONLINE ONLINE anbob2 ora.ctssd 1 ONLINE ONLINE anbob2 OBSERVER ora.diskmon 1 OFFLINE OFFLINE ora.evmd 1 ONLINE ONLINE anbob2 ora.gipcd 1 ONLINE ONLINE anbob2 ora.gpnpd 1 ONLINE ONLINE anbob2 ora.mdnsd 1 ONLINE ONLINE anbob2
Note:
CRSD资源已offline.
检查CRSd log
022-05-02 13:23:15.828: [ohasd(32935)]CRS-10000:CLSU-00100: Operating System function: mkdir failed with error data: 28 CLSU-00101: Operating System error message: No space left on device CLSU-00103: error location: authprep6 CLSU-00104: additional error information: failed to make dir /oracle/app/11.2.0.4/grid/auth/ohasd/anbob2/A8421321 2022-05-02 13:23:29.018: [crsd(42907)]CRS-1013:The OCR location in an ASM disk group is inaccessible. Details in /oracle/app/11.2.0.4/grid/log/anbob2/crsd/crsd.log. 2022-05-02 13:23:29.047: [crsd(42907)]CRS-0804:Cluster Ready Service aborted due to Oracle Cluster Registry error [PROC-26: Error while accessing the physical storage ORA-09925: Unable to create audit trail file Linux-x86_64 Error: 28: No space left on device Additional information: 9925 ]. Details at (:CRSD00111:) in /oracle/app/11.2.0.4/grid/log/anbob2/crsd/crsd.log. 2022-05-02 13:23:29.157: ... 2022-05-02 13:25:20.344: [ohasd(32935)]CRS-2765:Resource 'ora.crsd' has failed on server 'anbob2'. 2022-05-02 13:25:20.344: [ohasd(32935)]CRS-2771:Maximum restart attempts reached for resource 'ora.crsd'; will not restart.
Note:
因为OS 磁盘资源crash.
检查文件系统
$ df $ df -i
Note:
显示ORACLE_BASE目录 文件系统的INODE 100%, 关于INODE介绍可以参考这里
Linux uses inodes (index nodes) to keep track of all files in the system whether it is images, videos, emails, spams, website content, backups. But every system has a limit on number of inodes allowed, depending on the system memory,Find Directory with Most Inode Usage
for i in /*; do echo $i; find $i |wc -l; done
为什么CRSD.BIN会影响PUBLIC ip
这个可以在测试环境还原问题,在kill crsd.bin后,listener上的public ip会立即自动消失, 当crsd.bin 自动启动后,public ip又会再次注册到listener上, 以上工作均有oraagent完成,无需任何操作,当 crsd 和 oraagent 进程自动重新启动时,应在 1 分钟左右注册丢失的 endpoint。 当然影响的不只是listener还有scan listener. 因为listener的动态ENDPOINTS是被oraagent进程完成,当crsd.bin挂掉时oragent也会自动挂掉,这时因为vip 的endpoint被db instance使用,所以vip保留,而public ip消失。
Listener dynamic endpoints are registered by the oraagent process. When crsd.bin dies, oraagent process will also die. Then the listener dynamic endpoints will not be available until oraagent process is restarted and registers those dynamic endpoints again. This is expected behavior.
However listener wil not drop the dynamic endpoint if the endpoint is in use by instance even after the oraagent process is terminated. This can be seen from the lsnrctl status output above, the host VIP endpoint remains after crsd.bin crashes.
Read more…
1, Start from 11.2 GRID Agent dynamically registers endpoints (VIP and Public IP) with the listener. Agent gets Public IP from /etc/hosts (if no DNS). If Agent fails to get Public IP, then the listener end point will not be created. or incorrect permission of /etc/nsswitch.conf
2, The IP address was changed on this server. lsnrctl status shows that service name ‘ can not be registered to listener.
The content of init parameter local_listener was read from tnsnames.ora only when the database started. It was stored in v$listener_network(X$KMMNV).
Changes in tnsnames.ora would not be reflected to v$listener_network automatically. So PMON/LREG still uses the old value in v$listener_network for dynamic service registration.To reflect the changes in tnsnames.ora, you need to set init parameter local_listener again with the same alias.
Running “lsnrctl RELOAD” against a Listener will only affect Dynamic database services, instances, service handlers, and listening endpoints.
Static ones, such as those in the ADDRESS section of the Listener.ora file are not changed.
对不起,这篇文章暂时关闭评论。