首页 » ORACLE 9i-23ai » Troubleshooting Oracle ASM instance crash with ‘Linux-x86_64 Error: 24: Too many open files’
Troubleshooting Oracle ASM instance crash with ‘Linux-x86_64 Error: 24: Too many open files’
oracle 11g r2 RAC 其中一个节点实例1 crash并重启,日志查看有提示“ Linux-x86_64 Error: 24: Too many open files”
DB Alert log
Thu Dec 19 06:09:16 2024 NOTE: ASMB terminating Errors in file /u/app/oracle/diag/rdbms/anbobdb/anbob1/trace/anbob1_asmb_71126.trc: ORA-15064: communication failure with ASM instance ORA-03113: end-of-file on communication channel Session ID: 2109 Serial number: 3 Errors in file /u/app/oracle/diag/rdbms/anbobdb/anbob1/trace/anbob1_asmb_71126.trc: ORA-15064: communication failure with ASM instance ORA-03113: end-of-file on communication channel Process ID: Session ID: 2109 Serial number: 3 ASMB (ospid: 71126): terminating the instance due to error 15064 Thu Dec 19 06:09:17 2024 System state dump requested by (instance=1, osid=71126 (ASMB)), summary=[abnormal instance termination]. System State dumped to trace file /u/app/oracle/diag/rdbms/anbobdb/anbob1/trace/anbob1_diag_71061.trc
ASM Alert log
Thu Dec 19 06:09:16 2024 PMON (ospid: 70549): terminating the instance due to error 488 Thu Dec 19 06:09:17 2024 System state dump requested by (instance=1, osid=70549 (PMON)), summary=[abnormal instance termination]. System State dumped to trace file /u/app/base/diag/asm/+asm/+ASM1/trace/+ASM1_diag_70566.trc Dumping diagnostic data in directory=[cdmp_20241219060917], requested by (instance=1, osid=70549 (PMON)), summary=[abnormal instance termination].
同时建议检查OS MESSAGE,和OSW 资源使用。
ASM1_rbal_70592.trc
*** 2024-12-19 06:09:15.947 ** DBGRL Error: ARB Alert Log ** DBGRL Error: SLERC_OERC, 48180 ** DBGRL Error: Linux-x86_64 Error: 24: Too many open files Additional information: 1 ** DBGRL Error: <msg time='2024-12-19T06:09:15.944+08:00' org_id='oracle' comp_id='asm' client_id='' type='UNKNOWN' level='16' host_id='anbob2-node1' host_addr='10.65.15.xx' module='' pid='70592'> <txt>Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0 ** DBGRL Error: Text Alert Log ** DBGRL Error: SLERC_OERC, 48180 ** DBGRL Error: Linux-x86_64 Error: 24: Too many open files Additional information: 1 ** DBGRL Error: Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x5C] [PC:0x8238B00, lxhh2ci()+4] [flags: 0x0, count: 1] Cannot open /proc/self/exe for reading: errno=24
/proc/self/exe
/proc/self/exe 是 Linux 系统中的一个特殊文件,属于 /proc 虚拟文件系统的一部分。它是一个符号链接,指向当前运行进程的可执行文件。
示例代码
[root@db1 ~]# cat test.c #include #include int main() { char path[1024]; ssize_t len = readlink("/proc/self/exe", path, sizeof(path) - 1); if (len != -1) { path[len] = '\0'; printf("Executable path: %s\n", path); } else { perror("readlink"); } return 0; } [root@db1 ~]# gcc test.c -o test [root@db1 ~]# ./test Executable path: /root/test [root@db1 ~]# readlink /proc/self/exe /usr/bin/readlink
Note:
self
是一个特殊的标识符,它总是指向当前进程。因此,无论哪个进程读取 /proc/self/exe
,都会得到该进程的可执行文件路径。可用于应用程序自动更新自身。 所以问题不在 /proc/self/exe。而是error 24, Too many open files.
user limit
-- file /etc/security/limit.conf ##########set oracle environment########## oracle soft nproc 2047 oracle hard nproc 16384 oracle soft nofile 1024 oracle hard nofile 131072 grid soft nproc 2047 grid hard nproc 16384 grid soft nofile 1024 grid hard nofile 131072
Too many open files
如果您在Linux中看到“Too many open files”错误消息,那么您的进程已经达到了允许打开的文件的上限,通常是1024。
可以使用此命令查看系统范围内文件句柄的最大数量。 cat /proc/sys/fs/file-max 找出一个进程可以打开的最大文件数,我们可以使用ulimit命令和-n(open file)选项。 ulimit -n 当然,在实际情况中,您可能不知道哪个进程刚刚吞噬了所有的文件句柄。要开始您的调查,您可以使用以下管道命令序列。它会告诉你10个最多的用户进程在您的计算机上的文件句柄。 lsof | awk '{ print $1 " " $2; }' | sort -rn | uniq -c | sort -rn | head
这个案例建议调整Linux 参数,增加open files 限制.
对于最大进程数的限制,见另一blog《Troubleshooting errors caused by OS resource limit on AIX,HP-UX, SolarisOS, Linux》
目前这篇文章还没有评论(Rss)