首页 » ORACLE 9i-23ai, 系统相关 » Troubleshooting errors caused by OS resource limit on AIX,HP-UX, SolarisOS, Linux

Troubleshooting errors caused by OS resource limit on AIX,HP-UX, SolarisOS, Linux

操作系统资源限制有时会导致上面的应用程序无法fock新进程或open 文件,导致连接创建失败或实例crash, 尤其当数据库的进程数搞的很大时,开始的OS kernel resource limit没有级联的修改,就有可能导致该问题的发生。 有时Linux操作系统命令会提示: fork: retry: No child processes , 数据库创建新的连接时提示ORA-12518 or 12537。

CASE 1, HP-UX 11.31 通过监听创建新连接报错

TNS-12518: TNS:listener could not hand off client connection
TNS-12536: TNS:operation would block
TNS-12560: TNS:protocol adapter error
TNS-00506: Operation would block
HPUX Error: 246: Operation would block

以上可看是遇到OS 进程上限,需要检查uproc和maxuproc, ORACLE环境建议值

参数名 NODE1 Oracle推荐值
aio_max_ops 8192 >= 2048
executable_stack 0 0
filecache_min 3% 5%
filecache_max 5% 10%
ksi_alloc_max 131072 >= nproc*8
max_async_ports 16384 >= nproc
max_thread_proc 1200 >= 1024
maxdsiz 1073741824 >= 1073741824
maxdsiz_64bit 137438953472 >= 2147483648
maxssiz 134217728 >= 134217728
maxssiz_64bit 2147483648 >= 1073741824
maxuprc 20000 >= ((nproc*9)/10)+1
msgmni 16384 >= nproc
msgtql 16384 >= nproc
ncsize 134144 8*nproc+3072
nflocks 16384 >= nproc

查看资源使用

oracle@anbob:/home/oracle> kcusage                                                                                                                                                                                     
Tunable                 Usage / Setting      
=============================================
filecache_max     33312395264 / 39194697728
maxdsiz             352256000 / 1073741824
maxdsiz_64bit       239075328 / 137438953472
maxfiles_lim            23564 / 65535
maxssiz                131072 / 134217728
maxssiz_64bit         2097152 / 2147483648
maxtsiz              13484032 / 100663296
maxtsiz_64bit       771751936 / 1073741824
maxuprc                 15551 / 16384
max_thread_proc           385 / 1200
msgmbs                      0 / 8
msgmni                      2 / 16384
msgtql                      0 / 16384
nflocks                   101 / 16384
ninode                  10403 / 1157120
nkthread                18576 / 28688
nproc                   16382 / 21000
npty                        2 / 60
nstrpty                    12 / 60
nstrtel                     0 / 60
nswapdev                    2 / 32
nswapfs                     0 / 32
semmni                    116 / 1024
semmns                  22095 / 307200
shmmax           161061273600 / 274877906944
shmmni                     39 / 4096
shmseg                      4 / 512

CASE2, AIX 平台的应用运行时报错

ORA-04030:  (TCHK^9d12ad4,eavp:kkestRCHistgrm)

call stack 中包含“kghnospc“=》 kernel generic heap manager no space available in the heap, signal an error
dump trace

=======================================
PRIVATE MEMORY SUMMARY FOR THIS PROCESS
---------------------------------------
******************************************************
PRIVATE HEAP SUMMARY DUMP
111 MB total:   #进程使用PGA 111MB
   111 MB commented, 605 KB permanent
    47 KB free (0 KB in empty extents),
     103 MB,   1 heap:    "session heap   "
------------------------------------------------------
Summary of subheaps at depth 1
110 MB total:
    35 MB commented, 109 KB permanent
    75 MB free (30 MB in empty extents),
      45 MB,   1 heap:    "kolr heap ds i "            44 MB free held
      28 MB,   3 heaps:   "koh dur heap d "            1056 KB free held

-------------------------
Top 10 processes:
-------------------------
(percentage is of 1697 MB total allocated memory)
 7% pid 201: 111 MB used of 112 MB allocated  # CURRENT PROC 当前进程使用最高,111MB
 4% pid 204: 56 MB used of 63 MB allocated (5696 KB freeable)
 4% pid 13: 60 MB used of 62 MB allocated
 4% pid 14: 59 MB used of 62 MB allocated
 3% pid 202: 53 MB used of 59 MB allocated (6016 KB freeable)
 3% pid 200: 52 MB used of 58 MB allocated (5824 KB freeable)
 3% pid 40: 52 MB used of 56 MB allocated (832 KB freeable)
 3% pid 37: 41 MB used of 55 MB allocated
 3% pid 12: 50 MB used of 52 MB allocated
 3% pid 173: 42 MB used of 49 MB allocated (5888 KB freeable)

================
SWAP INFORMATION
----------------
swap info: free_mem = 22096.49M rsv = 192.00M
           alloc = 112.52M avail = 49152.00M swap_free = 49039.48M
----- End of Customized Incident Dump(s) -----

ITpub案例类似, 需要检查PGA, _pga_max_size 和_smm_max_size和OS $ ulimit -a限制和当前进程的Limit限制

AIX 可以使用dbx查看当前进程的limit, 可以选LOCAL=NO server进程或LISTENR进程

# dbx -a [pid]
Type 'help' for help.
reading symbolic information ...
stopped in read at 0x90000000003c260 ($t1)
0x90000000003c260 (read+0x260) e8410028             ld   r2,0x28(r1)
(dbx) proc rlimit
rlimit name:          rlimit_cur               rlimit_max       (units)
 RLIMIT_CPU:         (unlimited)             (unlimited)        sec
 RLIMIT_FSIZE:       (unlimited)             (unlimited)        bytes
 RLIMIT_DATA:          134217728             (unlimited)        bytes  
 RLIMIT_STACK:          33554432              4294967296        bytes
 RLIMIT_CORE:        (unlimited)             (unlimited)        bytes
 RLIMIT_RSS:            33554432             (unlimited)        bytes
 RLIMIT_AS:          (unlimited)             (unlimited)        bytes
 RLIMIT_NOFILE:           100000             (unlimited)        descriptors
 RLIMIT_THREADS:     (unlimited)             (unlimited)        per process
 RLIMIT_NPROC:       (unlimited)             (unlimited)        per user
(dbx) 

Note:
in the dbx rlimit output, the RLIMIT_CUR is the soft limit and the RLIMIT_MAX is the hard limit. RLIMIT_CUR is the limit that is actually enforced, so the problem may persist if RLIMIT_CUR is not unlimited, even though RLIMIT_MAX may be unlimited. In this case, the instance may need to be restarted in order for RLIMIT_CUR to take on the new value.

如果不通过监听的进程则不存在该限制(继承oracle user limit), 原因是因为通过监听创建的进程依赖监听的limit配置,监听又依赖于启动监听的用户limit, 如是LISTNEER是OHASD CRS启动那继承的root, 如果是grid手工启需要检查grid limit, 当然也存在在调整了OS limit后,进程没有重启识别不到已改变的limit.

case 3, Solaris SunOS swap 不足出现的ora-4030

查看进程的Limit可以使用plimit

# plimit [PID]

按swap排序 
$ awk '/^zzz/{t=$5;next}/^\s*[0-9]/{print t,$4,$5}' xxxxxx_vmstat_16.10.31.1500.dat | sort -k2,2rn


How to Configure Swap Space (Doc ID 286388.1) 建议swap space=75%* OS memory
How does the Solaris Operating System Calculate Available Swap? (Doc ID 1010585.1)

When a process calls the malloc()/sbrk() commands, only virtual swap is allocated.
The operating system allocates the memory from physical disk-based swap first.
If disk-based swap is exhausted or unconfigured, the reservation is allocated from physical memory.
If both resources are exhausted then the malloc() call fails.
To ensure malloc() won't fail due to lack of virtual swap, configure a large physical disk-based swap
facility in the form of a device or swapfile.  You can monitor swap reservation via "swap -s" and "vmstat:swap",
as described above.

Follow the guidelines below to calculate amount of virtual swap usage:
Virtual swap = Physical Memory + Fixed Disk swap

CASE 4, linux 平台,在安装如OEM agent平台,进程不足
在linux 平台查看当前进程limit的方法比较多,如查看proc系统中进程限制

$ cat /proc/PID/limits

Nproc在操作系统级别定义,以限制每个用户的进程数。Oracle 11.2.0.4文档建议以下内容:

oracle soft nproc 2047
oracle hard nproc 16384

如果有运行oem agent这可能有点低, 您是否要检查自己是否超出限制?那么您可以使用“ ps”。但是请注意,默认情况下,“ ps”不会显示所有进程。在Linux中,执行多线程处理时,每个线程都实现为轻量级进程(LWP)。并且您必须使用“ -L”来查看所有这些内容。如以用户分组

$ ps h -Led -o user | sort | uniq -c | sort -n

如果不使用”-L” 还可以使用”ps -o nlwp,pid,lwp,args -u oracle | sort -n” 如有些环境Oracle 12c EM agent已启动可以启动1000多个个线程,当您达到nproc限制时,用户将无法创建新进程。clone()调用将返回EAGAIN,Oracle将其报告为:
ORA-27300: OS system dependent operation:fork failed with status: 11
ORA-27301: OS failure message: Resource temporarily unavailable

下面模拟一段Franck有段简短的forc进程的代码略改动一下,测试

[root@anbob ~]# ps h -Led -o user | sort | uniq -c | sort -n
      1 chrony
      1 dbus
      1 oracle
      1 rpc
      7 polkitd
    133 root
[oracle@oel7db1 ~]$ ulimit -u 500
[oracle@oel7db1 ~]$ cat fockp.c
#include
#include <sys/resource.h>
#include 
int main( int argc, char *argv[] )
{
        int i;
        int p[3000];
        // get nproc limit
        struct rlimit rl;
        if ( getrlimit( RLIMIT_NPROC , &rl) != 0 ) {
            printf("getrlimit() failed with errno=%d\n", errno);
                return 255;
        };
        // fork 3000 times
        for( i=1 ; i<= 3000 ; i++ ) { p[i] = fork(); if ( p[i] >= 0 ) {
                        if (  p[i] == 0 ) {
                                printf("parent says fork number %d sucessful \n" , i );
                        } else {
                                printf(" child says fork number %d pid %d \n" , i , p[i] );
                                sleep(100);
                                break;
                        }
                } else {
                        printf("parent says fork number %d failed (nproc: soft=%d hard=%d) with errno=%d\n", i, rl.rlim_cur , rl.rlim_max , errno);
                        return 255;
                }
        }
}

编译执行
[oracle@anbob ~]$ ./fockp
 child says fork number 1 pid 2442
parent says fork number 1 sucessful
 child says fork number 2 pid 2443
parent says fork number 2 sucessful
 child says fork number 3 pid 2444
parent says fork number 3 sucessful
 child says fork number 4 pid 2445
parent says fork number 4 sucessful
...
parent says fork number 497 sucessful
 child says fork number 498 pid 2941
parent says fork number 498 sucessful
parent says fork number 499 failed (nproc: soft=500 hard=500) with errno=11

使用root查看
[root@anbob ~]# ps h -Led -o user | sort | uniq -c | sort -n
      1 chrony
      1 dbus
      1 rpc
      7 polkitd
    133 root
    500 oracle

Linux平台限制在早期/etc/limits.conf中设置并用’ulimit -u’检查,但是根据RHEL官方文档,在5-8修改参数是修改/etc/security/limits.conf。

How to set ulimit values
Environment
Red Hat Enterprise Linux (RHEL) 5, 6, 7, 8
Issue
How to set ulimit values
Resolution
Settings in /etc/security/limits.conf take the following form:
# vi /etc/security/limits.conf
#            

*               -       core             
*               -       data             
*               -       priority         
*               -       fsize            
*               soft    sigpending        eg:57344
*               hard    sigpending        eg:57444
*               -       memlock          
*               -       nofile            eg:1024
*               -       msgqueue          eg:819200
*               -       locks            
*               soft    core             
*               hard    nofile           
@        hard    nproc            
          soft    nproc            
%        hard    nproc            
          hard    nproc            
@        -       maxlogins        
          hard    cpu              
          soft    cpu              
          hard    locks            
 can be:

a user name
a group name, with @group syntax
the wildcard *, for default entry
the wildcard %, can be also used with %group syntax, for maxlogin limit
 can have two values:

soft for enforcing the soft limits
hard for enforcing hard limits
 can be one of the following:

core - limits the core file size (KB)
data - max data size (KB)
fsize - maximum filesize (KB)
memlock - max locked-in-memory address space (KB)
nofile - max number of open files
rss - max resident set size (KB)
stack - max stack size (KB)
cpu - max CPU time (MIN)
nproc - max number of processes (see note below)
as - address space limit (KB)
maxlogins - max number of logins for this user
maxsyslogins - max number of logins on the system
priority - the priority to run user process with
locks - max number of file locks the user can hold
sigpending - max number of pending signals
msgqueue - max memory used by POSIX message queues (bytes)
nice - max nice priority allowed to raise to values: [-20, 19]
rtprio - max realtime priority
Exit and re-login from the terminal for the change to take effect.

文档中Setting nproc in /etc/security/limits.conf has no effect in Red Hat Enterprise Linux. 配置nproc不启作用,
Resolution
Add the desired entry in /etc/security/limits.d/90-nproc.conf instead of /etc/security/limits.conf.
Root Cause
For limits, the PAM stack is moving to a modular configuration. This includes the introduction of /etc/security/limits.d/90-nproc.conf, which sets the maximum number of processes to 1024 for non-root users. This was done in part to prevent fork-bombs.

After reading /etc/security/limits.conf, individual files from the /etc/security/limits.d/ directory are read. Only files with *.conf extension will be read from this directory.

所以如果安装oracle preinstall PRM配置oracle环境会发现它也是在/etc/security/limits.d/oracle-rdbms-server-12cR1-preinstall.conf中,它会覆盖/etc/security/limits.conf。

Linux的优点之一是您可以控制几乎所有与其相关的内容。这使系统管理员可以很好地控制其系统并更好地利用系统资源。还可以修复一个已经运行中的程序的limit限制,这适用于如应用server无重启时间,可以在线修改。On Linux systems with kernel >=2.6.36 and util-linux >=2.21, you can use the prlimit command to set a process resource limits: (和solariOS有点像)

下面演示如何修改一个已运行的程序的limit

[root@anbob ~]# ps -ef|grep lsnr
oracle   15837     1  0 00:02 ?        00:00:00 /u01/app/oracle/product/19.2.0/db_1/bin/tnslsnr LISTENER -inherit
root     16128 16100  0 00:07 pts/1    00:00:00 grep --color=auto lsnr

[root@anbob ~]# cat /proc/15837/limits
Limit                     Soft Limit           Hard Limit           Units
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            10485760             33554432             bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             16384                16384                processes
Max open files            65536                65536                files
Max locked memory         137438953472         137438953472         bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       14595                14595                signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

[oracle@anbob ~]$ gdb
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-115.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.

(gdb) attach 15837
Attaching to process 15837

(gdb) set $rlim = &{0ll, 0ll}
(gdb) print getrlimit(7, $rlim)
$1 = 0
(gdb) print *$rlim
$2 = {65536, 65536}

TIP:
        Limit                 
     0  Max cpu time          
     1  Max file size         
     2  Max data size         
     3  Max stack size        
     4  Max core file size    
     5  Max resident set      
     6  Max processes         
     7  Max open files        
     8  Max locked memory     
     9  Max address space     
    10  Max file locks        
    11  Max pending signals   
    12  Max msgqueue size     
    13  Max nice priority     
    14  Max realtime priority 
    15  Max realtime timeout  

# 使用gdb modify 
(gdb) set *$rlim[0] = 1024*4
(gdb) print *$rlim
$3 = {4096, 65536}
(gdb) print setrlimit(7, $rlim)
$4 = 0

[root@anbob ~]# cat /proc/15837/limits|grep "open files"
Limit                     Soft Limit           Hard Limit           Units
Max open files            4096                 65536                files

[root@anbob ~]# prlimit  --nofile --output RESOURCE,SOFT,HARD --pid 15837
RESOURCE SOFT  HARD
NOFILE   4096 65536


# 使用prlimit修改
[root@anbob ~]# prlimit --nofile=1024:8192 --pid 15837

[root@anbob ~]# cat /proc/15837/limits |grep "open files"
Limit                     Soft Limit           Hard Limit           Units
Max open files            1024                 8192                 files

(gdb) print getrlimit(7, $rlim)
$5 = 0
(gdb) print *$rlim
$6 = {1024, 8192}
(gdb)

Note:
resource limit限制分为soft和hard, soft limit就是实际resource限制,hard limit限制只是为了使用limit命令可以修改的最大上限。

打赏

对不起,这篇文章暂时关闭评论。