Troubleshoot ORA-27544,ORA-27300,ORA-27301,ORA-27302,”HPUX-ia64 Error: 23: File table overflow” issue
A product database is 10205 2nodes rac ,our OS is HP-UNIX 11.31,There was a time unable establish new connections, then I check alert log found ORA-27544,ORA-27300,ORA-27301,ORA-27302,”HPUX-ia64 Error: 23: File table overflow” error. The problem seems to be OS related.
#alert log
Sun Nov 30 15:47:16 EAT 2014 Global Enqueue Services Deadlock detected. More info in file /opt/oracle/app/admin/xxxdb/bdump/xxxdb1_lmd0_5890.trc. Sun Nov 30 16:35:55 EAT 2014 Thread 1 advanced to log sequence 61987 (LGWR switch) Current log# 14 seq# 61987 mem# 0: /dev/vg_anbob13/rvgcrm13_8_039 Sun Nov 30 16:47:05 EAT 2014 Thread 1 advanced to log sequence 61988 (LGWR switch) Current log# 1 seq# 61988 mem# 0: /dev/vg_anbob01/rvgcrm01_redo01 Sun Nov 30 08:57:51 UTC 2014 Errors in file /opt/oracle/app/admin/xxxdb/udump/xxxdb1_ora_20033.trc: ORA-00603: ORACLE 服务器会话因致命错误而终止 ORA-27544: 不支持内存映射通信 ORA-27300: 操作系统系统相关操作: socket 失败, 状态为: 23 ORA-27301: 操作系统故障消息: File table overflow ORA-27302: 错误发生在: sskgxpcre1 Sun Nov 30 08:57:54 UTC 2014 Errors in file /opt/oracle/app/admin/xxxdb/udump/xxxdb1_ora_20235.trc: ORA-00603: ORACLE 服务器会话因致命错误而终止 ORA-01116: 打开数据库文件 521 时出错 ORA-01110: 数据文件 521: '/dev/vg_anbob04/rvgcrm04_8_173' ORA-27041: 无法打开文件 HPUX-ia64 Error: 23: File table overflow Additional information: 3 ORA-01116: 打开数据库文件 504 时出错 ORA-01110: 数据文件 504: '/dev/vg_anbob03/rvgcrm03_8_021' ORA-27041: 无法打开文件 HPUX-ia64 Error: 23: File table overflow
# os log
$ vi /var/adm/syslog/syslog.log Nov 30 16:40:02 anbobdba su: - tty?? dsg-dsg Nov 30 16:40:02 anbobdba su: - tty?? dsg-dsg Nov 30 16:40:02 anbobdba above message repeats 2 times Nov 30 16:45:01 anbobdba telnetd[21220]: getpid: peer died: Error 0 Nov 30 16:50:01 anbobdba telnetd[4104]: getpid: peer died: Error 0 Nov 30 16:50:02 anbobdba su: - tty?? dsg-dsg Nov 30 16:50:02 anbobdba su: - tty?? dsg-dsg Nov 30 16:50:02 anbobdba above message repeats 2 times Nov 30 16:55:01 anbobdba telnetd[15605]: getpid: peer died: Error 0 Nov 30 16:57:51 anbobdba vmunix: file: table is full Nov 30 16:57:51 anbobdba vmunix: ffiillee:: ttaabbllee iiss ffuullll Nov 30 16:57:51 anbobdba vmunix: Nov 30 16:57:51 anbobdba vmunix: ffiillee:: ttaabbllee iiss ffuullll Nov 30 16:57:51 anbobdba vmunix: Nov 30 16:57:51 anbobdba vmunix: file: table is full Nov 30 16:57:53 anbobdba above message repeats 1900 times Nov 30 16:57:54 anbobdba vmunix: file: table is full Nov 30 16:57:54 anbobdba vmunix: file: table is full Nov 30 16:57:54 anbobdba above message repeats 4984 times Nov 30 16:57:54 anbobdba vmunix: file: table is full Nov 30 16:57:54 anbobdba vmunix: ffiillee:: ttaabbllee iiss ffuullll Nov 30 16:57:54 anbobdba vmunix: Nov 30 16:57:54 anbobdba vmunix: file: table is full Nov 30 16:57:54 anbobdba above message repeats 1174 times Nov 30 16:57:54 anbobdba vmunix: file: table is full Nov 30 16:57:55 anbobdba vmunix: file: table is full
# user limit
oracle#ulimit -a time(seconds) unlimited file(blocks) unlimited data(kbytes) 2097152 stack(kbytes) 204800 memory(kbytes) unlimited coredump(blocks) 4194303 nofiles(descriptors) 4096 [oracle@anbobdba:/opt/oracle/app/product/10.2.0/db_1/network/log]#/usr/sbin/kcweb -F Kernel Configuration->Tunables (All) ------------------------------------------------------------------------------------------------------------------------------------------ Tunable Tuning Current Next Boot Default Usage Module Capability Value Value Value =================================================================================================================================SCROLL /\ fcd_disable_mgmt_lun Dynamic 0 0 0 - fcd fclp_ifc_disable_mgmt_lun Dynamic 0 0 0 - fclp filecache_max Auto 15656274165 Automatic 97851711488 99.4% fs_bufcache filecache_min Auto 9785167872 - 9785167872 - fs_bufcache fr_rulecache Dynamic 0 0 0 - ipf fr_statemax Dynamic 800000 800000 800000 - ipf fr_tcpidletimeout Dynamic 86400 86400 86400 - ipf fs_async Static 0 0 0 - fs fs_symlinks Dynamic 20 20 20 - fs ftable_hash_locks Static 64 64 64 - fs_filedscrp gvid_no_claim_dev Dynamic 0 0 0 - gvid_core hires_timeout_enable Dynamic 0 0 0 - pm_callout hp_hfs_mtra_enabled Static 1 1 1 - ufs intr_strobe_ics_pct Dynamic 80 80 80 - svc io_ports_hash_locks Static 64 64 64 - io ipf_icmp6_passthru Dynamic 0 0 0 - ipf ipl_buffer_sz Dynamic 8192 8192 8192 - ipf ipl_logall Dynamic 0 0 0 - ipf ipl_suppress Dynamic 1 1 1 - ipf ipmi_watchdog_action Dynamic 0 0 0 - ipmi kmem_aggressive_caching Dynamic 0 0 0 - vm_kmem ksi_alloc_max Dynamic 33600 33600 33600 - pm_sig ksi_send_max Static 32 32 32 - pm_sig lcpu_attr Auto 0 0 0 - pm_sched lotsfree_pct Dynamic 0 0 0 - vm max_acct_file_size Dynamic 2560000 2560000 2560000 - pm_acct max_async_ports Dynamic 4096 4096 4096 - asyncdsk max_mem_window Dynamic 0 0 0 - vm max_thread_proc Dynamic 2048 2048 256 12.5% pm_proc maxdsiz Dynamic 2147483648 2147483648 1073741824 66.2% vm maxdsiz_64bit Dynamic 17179869184 17179869184 4294967296 1.8% vm maxfiles Static 4096 10240 2048 - fs maxfiles_lim Dynamic 10240 10240 4096 16.4% fs maxrsessiz Static 8388608 8388608 8388608 - vm maxrsessiz_64bit Static 8388608 8388608 8388608 - vm maxssiz Dynamic 209715200 209715200 8388608 0.5% vm maxssiz_64bit Dynamic 1073741824 1073741824 268435456 0.1% vm maxtsiz Dynamic 100663296 100663296 100663296 35.6% vm maxtsiz_64bit Dynamic 1073741824 1073741824 1073741824 20.3% vm maxuprc Dynamic 20000 20000 256 4.2% pm_proc mca_recovery_on Auto 1 - 1 - shutdown mpas_readonly_text Dynamic 0 0 0 - vm mprotect_reduce_protid_on Dynamic 0 0 0 - vm msgmbs Dynamic 8 8 8 - pm_usync msgmnb Dynamic 16384 16384 16384 - pm_usync msgmni Dynamic 4096 4096 512 0.1% pm_usync msgtql Dynamic 4096 4096 1024 0.0% pm_usync ncdnode Static 150 150 150 - cdfs $glance H f Glance C.04.70.001 09:44:59 anbobdba ia64 Current Avg High ------------------------------------------------------------------------------------------------------------------------------------------ Cpu Util S SN NRU U | 95% 93% 95% Disk Util F F |100% 100% 100% Mem Util S SU U | 78% 78% 78% Networkil U UR R | 68% 68% 68% ------------------------------------------------------------------------------------------------------------------------------------------ SYSTEM TABLES REPORT Users= 9 System Table Available Used Utilization High(%) -------------------------------------------------------------------------------- Proc Table (nproc) 42975 2993 7 7 File Table (nfile) 650480 597066 92 92 Shared Mem Table (shmmni) 512 12 2 2 Message Table (msgmni) 4096 6 0 0 Semaphore Table (semmni) 4096 33 1 1 File Locks (nflocks) 36000 3356 9 9 Pseudo Terminals (npty) 60 0 0 0 Buffer Headers (nbuf) na 2560 na na
or
#kctune -v nfile
Tip:
nfile:Maximum number of files of all process open in operation systems
maxfiles_lim:Maximum number of files can be opened in a single process
nproc: Number of process systems can run concurrently
note:
Increase the nfile parameter will effect of OS memory usage ,This value is typically the maximum number should be larger than the peak load 10-25%,Open the file user limit the Kernel parameters maxfiles . This is controlled by the value of a hard limit parameter maxfiles_lim, default limit is 2048 .see more can use “man nfile”(maxfiels_lim,nproc)
On 32-bit systems – Each entry nfile allocate 56 bytes.
On 64-bit systems – Each entry nfile allocate 88 bytes.
Also as per Oracle documentation, i have configured kernel parameters as below,
Oracle Recommended Kernel Parameter settings for HP Itanium v3 11.31
http://docs.oracle.com/cd/E14004_01/books/PerformTun/PerformTunOS12.html#wp1307268
Modify the HP-UX kernel parameters to values like those shown below (suggested guidelines). Use the HP-UX System Administration Manager (SAM) tool to make these changes.
nproc 4096 - 4096 ksi_alloc_max 32768 - (NPROC*8) max_thread_proc 4096 - 4096 maxdsiz 0x90000000 - 0X90000000 maxdsiz_64bit 2147483648 - 2147483648 maxfiles 4000 - 4000 maxssiz 401604608 - 401604608 maxssiz_64bit 1073741824 - 1073741824 maxtsiz 0x40000000 - 0X40000000 msgmap 4098 - (NPROC+2) msgmni 4096 - (NPROC) msgtql 4096 - (NPROC) ncsize 35840 - (8*NPROC+2048+VX_NCSIZE) nfile 67584 - (16*NPROC+2048) ninode 34816 - (8*NPROC+2048) nkthread 7184 - (((NPROC*7)/4)+16) nproc 4096 - 4096 nsysmap 8192 - ((NPROC)>800?2*(NPROC):800) nsysmap64 8192 - ((NPROC)>800?2*(NPROC):800) semmni 1024 - 1024 semmns 16384 - ((NPROC*2)*2) semmnu 2048 - 2048 semume 256 - 256 shmmax 0x40000000 Y 0X40000000 shmmni 1024 - 1024 shmseg 1024 Y 1024 vps_ceiling 64 - 64
BTW.Another argument if you monitory can be set remind threshold=oracle.process*oracle.datafiles+2048,Limit=nproc*oracle.datafiles.
NAME VALUE UNIT ---------------------------------------------------------------- --------------- ------------ aggregate PGA target parameter 42949672960 bytes aggregate PGA auto target 36763075584 bytes global memory bound 1073741824 bytes total PGA inuse 2100608000 bytes total PGA allocated 2831215616 bytes maximum PGA allocated 10462026752 bytes total freeable PGA memory 400359424 bytes process count 831 max processes count 2999 PGA memory freed back to OS 15560814297088 bytes total PGA used for auto workareas 0 bytes maximum PGA used for auto workareas 4534104064 bytes total PGA used for manual workareas 0 bytes maximum PGA used for manual workareas 17006592 bytes over allocation count 0 bytes processed 42472610850816 bytes extra bytes read/written 494029733888 bytes cache hit percentage 98.85 percent recompute count (total) 2282871 SQL> show parameter process NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ aq_tm_processes integer 0 db_writer_processes integer 6 gcs_server_processes integer 12 job_queue_processes integer 10 log_archive_max_processes integer 2 processes integer 3000
Use the “lsof” command to find what is using the file descriptiors on the system.
lsof -g | awk '{print $2}' | sort -u > /tmp/lsof_sort.txt lsof -g | awk '{print $2}' > /tmp/lsof.txt for var in `cat /tmp/lsof_sort.txt` do echo `echo "$var ---- "``grep -x $var /tmp/lsof.txt | wc -l` done
This will list all the processes and the corresponding number of files opened by them. You can pick the processes which have the most number of files open and see what are they.
or
Use the scripts provided by HP engineers:
glance -adviser_only -syntax /tmp/proc_num_files -iterations 1 or use SAR -v check. Other useful command can be "fuser".
Summary:
Error 23> File table overflow. The system’s table of open files is full,and temporarily no more open()s can be accepted
Increase the value of the kernel parameter “maxusers”, as it influences the default value of “nfile”. If this does not solve the problem, you could
increase “nfile” independently.
This is about kernel parameters in general –
http://docs.hp.com/en/939/KCParms/KCparams.OverviewAll.html
To modify kernel parameters (from HP docs) :
as root #kctune nfile=xxxx or *Enter the SAM command to start the System Administration Manager (SAM) program. Double-click the Kernel Configuration icon. Double-click the Configurable Parameters icon. Double-click the parameter that you want to change and type the new value in the Formula/Value field. Click OK. Repeat these steps for all of the kernel configuration parameters that you want to change. When you are finished setting all of the kernel configuration parameters, select Action --> Process New Kernel from the action menu bar.
The HP-UX operating system automatically restarts after you change the values for the kernel configuration parameters.
对不起,这篇文章暂时关闭评论。