首页 » ORACLE 9i-23ai » Troubleshooting Oracle ASM instance crash with ‘Linux-x86_64 Error: 24: Too many open files’

Troubleshooting Oracle ASM instance crash with ‘Linux-x86_64 Error: 24: Too many open files’

oracle 11g r2 RAC 其中一个节点实例1 crash并重启,日志查看有提示“ Linux-x86_64 Error: 24: Too many open files”

DB Alert log

Thu Dec 19 06:09:16 2024
NOTE: ASMB terminating
Errors in file /u/app/oracle/diag/rdbms/anbobdb/anbob1/trace/anbob1_asmb_71126.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel

Session ID: 2109 Serial number: 3
Errors in file /u/app/oracle/diag/rdbms/anbobdb/anbob1/trace/anbob1_asmb_71126.trc:
ORA-15064: communication failure with ASM instance
ORA-03113: end-of-file on communication channel
Process ID: 
Session ID: 2109 Serial number: 3
ASMB (ospid: 71126): terminating the instance due to error 15064
Thu Dec 19 06:09:17 2024
System state dump requested by (instance=1, osid=71126 (ASMB)), summary=[abnormal instance termination].
System State dumped to trace file /u/app/oracle/diag/rdbms/anbobdb/anbob1/trace/anbob1_diag_71061.trc

ASM Alert log

Thu Dec 19 06:09:16 2024
PMON (ospid: 70549): terminating the instance due to error 488
Thu Dec 19 06:09:17 2024
System state dump requested by (instance=1, osid=70549 (PMON)), summary=[abnormal instance termination].
System State dumped to trace file /u/app/base/diag/asm/+asm/+ASM1/trace/+ASM1_diag_70566.trc
Dumping diagnostic data in directory=[cdmp_20241219060917], requested by (instance=1, osid=70549 (PMON)), summary=[abnormal instance termination].

同时建议检查OS MESSAGE,和OSW 资源使用。

ASM1_rbal_70592.trc

*** 2024-12-19 06:09:15.947
** DBGRL Error: ARB Alert Log
** DBGRL Error: SLERC_OERC, 48180
** DBGRL Error: Linux-x86_64 Error: 24: Too many open files  
Additional information: 1
** DBGRL Error: <msg time='2024-12-19T06:09:15.944+08:00' org_id='oracle' comp_id='asm' client_id='' type='UNKNOWN' level='16'
 host_id='anbob2-node1' host_addr='10.65.15.xx' module='' pid='70592'>
 <txt>Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0
** DBGRL Error: Text Alert Log
** DBGRL Error: SLERC_OERC, 48180
** DBGRL Error: Linux-x86_64 Error: 24: Too many open files 
Additional information: 1
** DBGRL Error: Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x5C] [PC:0x8238B00, lxhh2ci()+4] [flags: 0x0, count: 1]
Cannot open /proc/self/exe for reading: errno=24  

/proc/self/exe
/proc/self/exe 是 Linux 系统中的一个特殊文件,属于 /proc 虚拟文件系统的一部分。它是一个符号链接,指向当前运行进程的可执行文件。
示例代码

[root@db1 ~]# cat test.c
#include 
#include 

int main() {
    char path[1024];
    ssize_t len = readlink("/proc/self/exe", path, sizeof(path) - 1);
    if (len != -1) {
        path[len] = '\0';
        printf("Executable path: %s\n", path);
    } else {
        perror("readlink");
    }
    return 0;
}

[root@db1 ~]# gcc test.c -o test
[root@db1 ~]# ./test
Executable path: /root/test

[root@db1 ~]# readlink /proc/self/exe
/usr/bin/readlink

Note:

self 是一个特殊的标识符,它总是指向当前进程。因此,无论哪个进程读取 /proc/self/exe,都会得到该进程的可执行文件路径。可用于应用程序自动更新自身。 所以问题不在 /proc/self/exe。而是error 24, Too many open files.

 user limit

-- file /etc/security/limit.conf
##########set oracle environment##########
oracle soft nproc 2047
oracle hard nproc 16384
oracle soft nofile 1024
oracle hard nofile 131072

grid soft nproc 2047
grid hard nproc 16384
grid soft nofile 1024
grid hard nofile 131072

Too many open files
如果您在Linux中看到“Too many open files”错误消息,那么您的进程已经达到了允许打开的文件的上限,通常是1024。

可以使用此命令查看系统范围内文件句柄的最大数量。
cat /proc/sys/fs/file-max

找出一个进程可以打开的最大文件数,我们可以使用ulimit命令和-n(open file)选项。
ulimit -n

当然,在实际情况中,您可能不知道哪个进程刚刚吞噬了所有的文件句柄。要开始您的调查,您可以使用以下管道命令序列。它会告诉你10个最多的用户进程在您的计算机上的文件句柄。
lsof | awk '{ print $1 " " $2; }' | sort -rn | uniq -c | sort -rn | head

这个案例建议调整Linux 参数,增加open files 限制.

对于最大进程数的限制,见另一blog《Troubleshooting errors caused by OS resource limit on AIX,HP-UX, SolarisOS, Linux

打赏

目前这篇文章还没有评论(Rss)

我要评论