首页 » ORACLE 9i-23ai » Troubleshooting TNS-12547&TNS-12560 AIX error 32: Broken pipe caused by tcp socket leak

Troubleshooting TNS-12547&TNS-12560 AIX error 32: Broken pipe caused by tcp socket leak

上周出现个蹊跷案例,最近一直在忙今天简单的记录一下, 中间件反馈数据库连接时失败,在数据库使用lsnrctl status 查看监听状态会发现Listener一会儿正常,一会儿报错,但是在Listener正常时可以看出listener的start date并没有restart过,报错信息如下:

[oracle@anbob1:/home/oracle]# /oracle/app/11.2.0.3/grid/bin/lsnrctl status

LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 04-JUL-2015 19:37:45

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
TNS-12547: TNS:lost contact
 TNS-12560: TNS:protocol adapter error
  TNS-00517: Lost contact
   IBM/AIX RISC System/6000 Error: 32: Broken pipe

Note:
这是一个2 nodes RAC ON AIX6, 11.2.0.3 Grid, 10.2.0.5 DB, 期间数据库和主机没做任何修改, 从”IBM/AIX RISC System/6000 Error: 32: Broken pipe” 更倾向于OS 资源限制,并不像有些案例那样如listener.log 大小限制, 或oracle 执行文件权限不对。

The error 32 indicates the communication has been broken while the listener is trying to hand off the client connection to the server process or dispatcher process.

[oracle@anbob1:/oracle/app/oracle/product/10.2.0/db/OPatch]# ./opatch lsinventory
Invoking OPatch 10.2.0.5.1

Oracle Interim Patch Installer version 10.2.0.5.1
Copyright (c) 2010, Oracle Corporation.  All rights reserved.


Oracle Home       : /oracle/app/oracle/product/10.2.0/db
Central Inventory : /oracle/app/oraInventory
   from           : /etc/oraInst.loc
OPatch version    : 10.2.0.5.1
OUI version       : 10.2.0.5.0
OUI location      : /oracle/app/oracle/product/10.2.0/db/oui
Log file location : /oracle/app/oracle/product/10.2.0/db/cfgtoollogs/opatch/opatch2015-07-06_14-42-44PM.log

Patch history file: /oracle/app/oracle/product/10.2.0/db/cfgtoollogs/opatch/opatch_history.txt

Lsinventory Output file location : /oracle/app/oracle/product/10.2.0/db/cfgtoollogs/opatch/lsinv/lsinventory2015-07-06_14-42-44PM.txt

--------------------------------------------------------------------------------
Installed Top-level Products (2): 

Oracle Database 10g                                                  10.2.0.1.0
Oracle Database 10g Release 2 Patch Set 4                            10.2.0.5.0
There are 2 products installed in this Oracle Home.


Interim patches (2) :

Patch  9952245      : applied on Thu Jan 02 21:23:40 GMT+08:00 2014
Unique Patch ID:  13348225
   Created on 17 Jan 2011, 04:22:06 hrs UTC
   Bugs fixed:
     9952245, 9879181, 9610672, 9872299, 9577921, 8308495, 9103966, 8996060
     6899183, 9261837, 8487143, 9646836

Patch  13632743     : applied on Thu Jan 02 19:58:43 GMT+08:00 2014


[grid@anbob1:/home/grid]# crsctl stat res ora.LISTENER.lsnr -n anbob1 -p
NAME=ora.LISTENER.lsnr
TYPE=ora.listener.type
ACL=owner:grid:rwx,pgrp:oinstall:rwx,other::r--
ACTION_FAILURE_TEMPLATE=
ACTION_SCRIPT=%CRS_HOME%/bin/racgwrap%CRS_SCRIPT_SUFFIX%
AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ALIAS_NAME=ora.%CRS_CSS_NODENAME_LOWER_CASE%.LISTENER_%CRS_CSS_NODENAME_UPPER_CASE%.lsnr
AUTO_START=restore
CHECK_INTERVAL=60
CHECK_TIMEOUT=30
DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=listener) PROPERTY(LISTENER_NAME=PARSE(%NAME%, ., 2))
DEGREE=1
DESCRIPTION=Oracle Listener resource
ENABLED=1
ENDPOINTS=TCP:1521
LOAD=1
LOGGING_LEVEL=1
NLS_LANG=
NOT_RESTARTING_TEMPLATE=
OFFLINE_CHECK_INTERVAL=0
ORACLE_HOME=%CRS_HOME%
PORT=1521
PROFILE_CHANGE_TEMPLATE=
RESTART_ATTEMPTS=5
SCRIPT_TIMEOUT=60
START_DEPENDENCIES=hard(type:ora.cluster_vip_net1.type) pullup(type:ora.cluster_vip_net1.type)
START_TIMEOUT=180
STATE_CHANGE_TEMPLATE=
STOP_DEPENDENCIES=hard(intermediate:type:ora.cluster_vip_net1.type)
STOP_TIMEOUT=0
TYPE_VERSION=1.2
UPTIME_THRESHOLD=1d
USR_ORA_ENV=ORACLE_BASE=/oracle/app/grid
USR_ORA_OPI=false
VERSION=11.2.0.3.0

NOTE:
首先我是尝试重启listener, 尝试了srvctl stop listener和 lsnrctl stop 都并未完全把listener停止,listener processes 的进程一直存在,后通过OS KILL 结束讲程,但是启动时报上面同样的错误,此时另一个节点的listener正常,多次尝试listener启动成功。检查listener

[oracle@anbob1:/home/oracle]# /oracle/app/11.2.0.3/grid/bin/lsnrctl status

LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 04-JUL-2015 19:37:36

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
Start Date                04-JUL-2015 19:11:19
Uptime                    0 days 0 hr. 26 min. 17 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      ON
Listener Parameter File   /oracle/app/11.2.0.3/grid/network/admin/listener.ora
Listener Log File         /oracle/app/grid/diag/tnslsnr/anbob1/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.19)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.119)(PORT=1521)))
Services Summary...
Service "anbob.com" has 1 instance(s).
  Instance "anbob1", status READY, has 1 handler(s) for this service...
Service "anbob_XPT.anbob.com" has 1 instance(s).
  Instance "anbob1", status READY, has 1 handler(s) for this service...
The command completed successfully

[oracle@anbob1:/home/oracle]# /oracle/app/11.2.0.3/grid/bin/lsnrctl status

LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 04-JUL-2015 19:38:08

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
TNS-12547: TNS:lost contact
 TNS-12560: TNS:protocol adapter error
  TNS-00517: Lost contact
   IBM/AIX RISC System/6000 Error: 32: Broken pipe
   
[oracle@anbob1:/home/oracle]# /oracle/app/11.2.0.3/grid/bin/lsnrctl status

LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 04-JUL-2015 19:38:10

Copyright (c) 1991, 2011, Oracle.  All rights reserved.

Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521))
STATUS of the LISTENER
------------------------
Alias                     LISTENER
Version                   TNSLSNR for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production
Start Date                04-JUL-2015 19:11:19
Uptime                    0 days 0 hr. 26 min. 51 sec
Trace Level               off
Security                  ON: Local OS Authentication
SNMP                      ON
Listener Parameter File   /oracle/app/11.2.0.3/grid/network/admin/listener.ora
Listener Log File         /oracle/app/grid/diag/tnslsnr/anbob1/listener/alert/log.xml
Listening Endpoints Summary...
  (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.19)(PORT=1521)))
  (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.119)(PORT=1521)))
Services Summary...
Service "anbob.com" has 1 instance(s).
  Instance "anbob1", status READY, has 1 handler(s) for this service...
Service "anbob_XPT.anbob.com" has 1 instance(s).
  Instance "anbob1", status READY, has 1 handler(s) for this service...
The command completed successfully


+--------------------------------------------------------------------------------------------------------------------------
+-topas_nmon--k=Kernel-stats-----Host=anbob1---------Refresh=2 secs---15:53.33---------------------------------------------+
| Memory ------------------------------------------------------------------------------------------------------------------|
|          Physical  PageSpace |        pages/sec  In     Out | FileSystemCache                                            |
|% Used       72.9%      0.9%  | to Paging Space   0.0    0.0 | (numperm)  7.7%                                            |
|% Free       27.1%     99.1%  | to File System    2.0    0.0 | Process   53.5%                                            |
|GB Used      87.5GB     0.3GB | Page Scans        0.0        | System    11.7%                                            |
|GB Free      32.5GB    31.7GB | Page Cycles       0.0        | Free      27.1%                                            |
|Total(GB)   120.0GB    32.0GB | Page Steals       0.0        |           ------                                           |
|                              | Page Faults    2009.9        | Total    100.0%                                            |
|------------------------------------------------------------ | numclient  7.7%                                            |
|Min/Maxperm     3574MB(  3%)  9532MB(  8%) <--% of RAM       | maxclient  8.0%                                            | 
|Min/Maxfree     960   1088       Total Virtual  152.0GB      | User      58.3%                                            | 
|Min/Maxpgahead    2      8    Accessed Virtual   77.6GB 51.0%| Pinned    14.4%                                            | |                                                             | lruable pages   30502384.0                                 | |-------------------------------------------------------------------------------------------- sys@anbob>select * from v$resource_limit;

select * from v$resource_limit;
RESOURCE_NAME                   CURRENT_UTILIZATION      MAX_UTILIZATION INITIAL_AL LIMIT_VALU
------------------------------ -------------------- -------------------- ---------- ----------
processes                                      1058                 1968       4000       4000
sessions                                       1081                 1993       4405       4405
enqueue_locks                                   507                 1244      53190      53190
enqueue_resources                               520                  885      20424  UNLIMITED
ges_procs                                      1057                 1966       4001       4001
ges_ress                                          0                    0      82503  UNLIMITED
ges_locks                                         0                    0     128193  UNLIMITED
ges_cache_ress                                 6730                63814          0  UNLIMITED
ges_reg_msgs                                    942                 3388       8750  UNLIMITED
ges_big_msgs                                    283                 2190       1934  UNLIMITED
ges_rsv_msgs                                      0                    0       1000       1000
gcs_resources                               1309030              2762250    2851766    2851766
gcs_shadows                                  787290               961880    2851766    2851766
dml_locks                                         7                  780      19380  UNLIMITED
temporary_table_locks                             0                  361  UNLIMITED  UNLIMITED
transactions                                      6               702674       4845  UNLIMITED
branches                                       1556                 3125       4845  UNLIMITED
cmtcallbk                                         0                   14       4845  UNLIMITED
sort_segment_locks                                1                  136  UNLIMITED  UNLIMITED
max_rollback_segments                          3314                 3314       4845      65535
max_shared_servers                                0                    0  UNLIMITED  UNLIMITED
parallel_max_servers                             17                  246         40       3600

22 rows selected.
      

Note:
发现监听问题依然存在,请原谅我用上面大篇幅贴结果,只是为了证明这种离奇的现象。cpu、内存资源充足,数据库连接数未满,在MOS 发现NOTE 1084443.1 搭述了现象很是接近,那里描述是aix 6引入新特性“ Traffic Regulation” 限制了tcp port 的范围,当client 尝试打开一个新socket时, 当发现socket达到最大上限时会提示上面的错误, 使用下面命令查看是否启用了该特性:

tcptr -show 

但是我这里的环境并没有并没有启用TCP Traffic Regulation policy, 之少给了排查的方向,查看一下当前的tcp 连接数。

[oracle@anbob1:/home/oracle]# ps -ef|wc -l
    1207
    
anbob1:/home/grid> netstat -an|wc -l
   62954   <<<<<<<
   
    
anbob1:/home/grid> netstat -an|grep 61020|wc -l
   56361  <<<<<<<
  

anbob1:/> lsattr -E -l sys0 | grep maxuproc
maxuproc        16384                                Maximum number of PROCESSES allowed per user      True

[oracle@anbob1:/home/oracle]# /usr/sbin/no -a | fgrep ephemeral
       tcp_ephemeral_high = 65500
        tcp_ephemeral_low = 9000
       udp_ephemeral_high = 65500
        udp_ephemeral_low = 9000

Note:
这里的显示的连接数好高,另一节点network连接数在5000左右,看一下连接明细满屏的tcp连接都是本机的private interconnect ip(如下),而且在都是通过61020 port发出, 这时送了一口气,不会是主机上某个程序的问题吧,这样就可以把球踢给OS了, 呵当然不好, 一个负责的DBA不能止步于此,下面要判断时哪个程序打开了这个端口的socket 连接。

...
tcp4       0      0  192.168.60.11.48817    192.168.60.11.61020    CLOSE_WAIT
tcp4       0      0  192.168.60.11.24318    192.168.60.11.61020    CLOSE_WAIT
tcp4       0      0  192.168.60.11.24319    192.168.60.11.61020    CLOSE_WAIT
tcp4       0      0  192.168.60.11.48818    192.168.60.11.61020    CLOSE_WAIT
tcp4       0      0  192.168.60.11.24320    192.168.60.11.61020    CLOSE_WAIT
tcp4       0      0  192.168.60.11.48819    192.168.60.11.61020    CLOSE_WAIT
tcp4       0      0  192.168.60.11.24321    192.168.60.11.61020    CLOSE_WAIT
tcp4       0      0  192.168.60.11.48820    192.168.60.11.61020    CLOSE_WAIT
...

下面就来演示如果根据socket port 定位process

anbob1:/> netstat  -aAn|grep CLOS*|head
f1000e003b08dbb8 tcp4       0      0  192.168.60.11.60433   192.168.60.11.61020   CLOSE_WAIT
f1000e00354a8bb8 tcp4       0      0  192.168.60.11.35934   192.168.60.11.61020   CLOSE_WAIT
f1000e0006fd5bb8 tcp4       0      0  192.168.60.11.11435   192.168.60.11.61020   CLOSE_WAIT
f1000e000546dbb8 tcp        0      0  *.*                   *.*                   CLOSED
f1000e0005527bb8 tcp4       0      0  *.*                   *.*                   CLOSED
f1000e0004bf93b8 tcp        0      0  *.*                   *.*                   CLOSED
f1000e0004becbb8 tcp4       0      0  *.*                   *.*                   CLOSED
f1000e0004b2ebb8 tcp        0      0  *.*                   *.*                   CLOSED
f1000e0004b34bb8 tcp4       0      0  *.*                   *.*                   CLOSED
f1000e00139d03b8 tcp4       0      0  192.168.60.11.35935   192.168.60.11.61020   CLOSE_WAIT

anbob1:/> kdb
           START              END <name>
0000000000001000 0000000004160000 start+000FD8
F00000002FF47600 F00000002FFDF9C8 __ublock+000000
000000002FF22FF4 000000002FF22FF8 environ+000000
000000002FF22FF8 000000002FF22FFC errno+000000
F1000F0A00000000 F1000F0A10000000 pvproc+000000
F1000F0A10000000 F1000F0A18000000 pvthread+000000
read vscsi_scsi_ptrs OK, ptr = 0x0


(0)> sockinfo f1000e003b08dbb8 tcpcb
---- TCPCB ----(@ F1000E003B08DBB8)----
    seg_next......@F1000E003B08DBB8  seg_prev......@F1000E003B08DBB8  
    t_softerror... 00000000 t_state....... 00000005 (CLOSE_WAIT)
    t_timer....... 00000000 (TCPT_REXMT)
    t_timer....... 00000000 (TCPT_PERSIST)
    t_timer....... 00000000 (TCPT_KEEP)
    t_timer....... 00000000 (TCPT_2MSL)
    t_rxtshift.... 00000000 t_rxtcur...... 00000004 t_dupacks..... 00000000 
    t_maxseg...... 000041CC t_force....... 00000000 
    t_flags....... 000003E0 (RFC1323|SENT_WS|RCVD_WS|SENT_TS|RCVD_TS|COPYFLAGS)
    t_oobflags.... 00000000 ()
    t_template....@0000000000000000  t_inpcb.......@F1000E003B08DAA0  
    t_iobc........ 00000000 t_timestamp... FAC3B601 snd_una....... 47F28D30 
    snd_nxt....... 47F28D30 snd_up........ 47F28D30 snd_wl1....... 866BAB50 
    snd_wl2....... 47F28D30 iss........... 47F28B50 
    snd_wnd....... 0000000000020E04 rcv_wnd....... 0000000000020E60 
    rcv_nxt....... 866BAB51 rcv_up........ 866BAB50 irs........... 866BAB4F 
    snd_wnd_scale. 00000002 rcv_wnd_scale. 00000002 req_scale_sent 00000002 
    req_scale_rcvd 00000002 last_ack_sent. 866BAB51 timestamp_rec. 5607037C 
    timestamp_age. 000E42DC rcv_adv....... 866DB9B1 snd_max....... 47F28D30 
    snd_cwnd...... 0000000000010730        snd_ssthresh.. 000000003FFFC000 
    t_idle........ 000042DC t_rtt......... 00000000 t_rtseq....... 47F28CD7 
    t_srtt........ 00000007 t_rttvar...... 00000004 t_rttmin...... 00000002 
    max_rcvd...... 0000000000000000        max_sndwnd.... 0000000000020E60 
    t_peermaxseg.. 000041CC snd_in_pipe... 00000000 
    sack_data.....@0000000000000000         snd_recover... 00000000 
    snd_high...... 47F28D30 snd_ecn_max... 47F28D30 snd_ecn_clear. 47F28D31 
    t_splice_with.@0000000000000000         t_splice_flags 00000000 


...
                               
---- SOCKET INFO ----(@ F1000E003B08D808)----
    type........ 0001 (STREAM) 
    opts........ 0000 ()       
    linger...... 0000 state....... FFFF81A2 (ISCONNECTED|CANTRCVMORE|PRIV|NBIO)
    pcb.....@F1000E003B08DAA0  proto...@0000000004875628  
    lock....@F1000E003B959080  head....@0000000000000000  
    q0......@0000000000000000  q.......@0000000000000000  
    q0len....... 0000 qlen........ 0000 qlimit...... 0000 
    timeo....... 0000 error....... 0000 special..... 0A08 
    pgid.... 0000000000000000  oobmark. 0000000000000000 
                               
snd:cc...... 0000000000000000  hiwat... 0000000000020E60 
    mbcnt... 0000000000000000  mbmax... 0000000000083980 
    lowat... 0000000000003FFF  mb......@0000000000000000  
    sel.....@0000000000000000  events...... 0000 
    iodone.. 00000000          ioargs..@0000000000000000  
    lastpkt.@F1000E0019F10C00  wakeone. FFFFFFFFFFFFFFFF 
    timer...@0000000000000000  timeo... 00000000 
    flags....... 0000 ()       
    wakeup.. 00000000          wakearg.@0000000000000000  
    lockwtg. FFFFFFFFFFFFFFFF  
    ....              
more
proc/fd:  3250/54028           
proc/fd: fd: 54028             
              SLOT NAME     STATE      PID    PPID          ADSPACE  CL #THS
                               
pvproc+32C800 3250*osysmond ACTIVE 0B2037C 0000001 0000000C2DA42590   0 000C    ######<<<<<<<<<<
                               
                               
(0)>  hcal 0B2037C
Value hexa: 00B2037C          Value decimal: 11666300

(0)> quit

anbob1:/> ps -ef|grep 11666300
    root 11666300        1   3   Apr 15      - 12970:57 /oracle/app/11.2.0.3/grid/bin/osysmond.bin
    root  1639464 19661020   0 14:10:21  pts/5  0:00 grep 11666300

Note:
这里我们又不得不把球抱起来,这是一个oracle的程序osysmond.bin, 定位程序在aix 还可以使用lsof 但是本机没有带,需要下载就一个执行程序

 [oracle@anbob1:/home/oracle]# ftp 133.96.60.19
ftp: connect: Resource temporarily unavailable

Note:
当时FTP 都提示connect 资源都不可用,可见OS的tcp 连接肯定出现了枯竭

anbob1:/>  lsof -i:61020

osysmond. 11666300 root *738u  IPv4 0xf1000e0041ff4bb8      0t0  TCP anbob1-prv:34166->anbob1-prv:61020 (CLOSE_WAIT)
osysmond. 11666300 root *739u  IPv4 0xf1000e0007091bb8      0t0  TCP anbob1-prv:35703->anbob1-prv:61020 (CLOSE_WAIT)
osysmond. 11666300 root *740u  IPv4 0xf1000e004283bbb8      0t0  TCP anbob1-prv:32682->anbob1-prv:61020 (CLOSE_WAIT)
...

TIP:
接下来在去MOS 以”osysmond.bin netstat”为关键字搜索,第一个就是我们想要的“Bug 15934834 osysmond.bin has lot of sockets open in close_wait state” 现象完全一致是因为osysmond.bin leaks socket. 但是MOS里 Workaround None, 而且当前环境也是在11.2.0.4 修复。

“Which socket states from netstat are candidates for the rmsock command?
FIN_WAIT_1,FIN_WAIT_2, TIME_WAIT, CLOSING are all states where the application performed an active close of the socket.

How do we know a socket is no longer associated with a process?
You can use lsof to check. It is highly recommend that you use netstat in conjunction with lsof to validate the situation prior to doing anything drastic.

Which socket states are not candidates for the rmsock command?
CLOSE_WAIT, CLOSED

停掉osysmond.bin进程就可以释放那些socket, 这里我是建议重启本节点的Instance和CRS后再观察, osysmond.bin是个monitor进程,有agent 守护,当天我们尝试最小修改通过OS kiil 了osysmond.bin ,agent自动重启了该进程且instence 未受影响,kill了该进程后socket释放,listener 恢复了正常。不过事后发现osysmond.bin 调用的socket还是有上涨趋势,如果下次有停机时间我们会尝试重启CRS.

Summary:
本次的问题是因为oracle bug 导致osysmond.bin 调用tcp socket 泄漏,导致占满了OS的TCP socket 连接,随后FTP 和监听变的异常,但是偶尔有连接释放socket,所以当时的监听表现是时好时坏。

 

–2015-7-29 updated

重启crs 也无法避免该问题,在11.2.0.3中 11.2.0.3.7 patch 修复了这个bug.

Patch description:  “Grid Infrastructure Patch Set Update : 11.2.0.3.7 (16742216)”

   Created on 24 Jun 2013, 22:52:45 hrs PST8PDT

   Bugs fixed:

     16619898, 16315641, 15876003, 14275572, 13919095, 13696251, 13348650

     12659561, 14305980, 14277586, 13987807, 14625969, 13825231, 12794268

     13000491, 13498267, 15934834, 11675721, 14082976, 12771830, 14515980

可以使用shell 加crontab 定时去检查

#!/bin/sh
###############################################################
## file: kill_bug15934834.sh
## 
## this scripts to clear connect tcp socket leak by osysmond.bin process
## Temporary workaround fixed bug 15934834
## author: weejar (anbob.com)
## date: 2015-7-29
## note: run as root, after kill osysmond process, it will automatically restart
##
################################################################

# logfile
LOGFILE=/oracle/lw/kill_bug15934834.log

# number of connect socket
NUM=`netstat -an|grep 61020|wc -l`

PSINFO=`ps -ef|grep "grid/bin/osysmond.bin"|grep -v grep`

# pid of osysmond.bin
PID=`echo $PSINFO|awk '{print $2}'`

echo "zzz ***"`date '+%a %b %e %T %Z %Y'` >> $LOGFILE
echo "Number of socket osysmond:$NUM" >> $LOGFILE

if [ $NUM -ge 50000 ]
then 
 echo `ps -ef|head -n 1` >> $LOGFILE
 echo $PSINFO >> $LOGFILE
 if [[ "" !=  "$PID" ]]; then
  echo "Warning: osysmond.bin pid $PID will be kill..." >> $LOGFILE
  kill -9 $PID
  if [ $? -ne 0 ]; then
   echo "Error: osysmond.bin pid $PID kill failed!" >> $LOGFILE
  fi
 fi
fi

打赏

对不起,这篇文章暂时关闭评论。