orphaned processes not cleanup when using resource limit(profile) in 9i RAC, cause skgpspawn failed(进程不回收)
Question:
Why is my database after running for some time , large number of database server processes will be created , but the number of sessions is very little, what actions cause orphaned processes? And sometimes hit ora-20 error, when I kill that OS processes manually every time,the problem is solved . and another node without symptoms.
Answer:
skgpspawn failed:category = 27143, depinfo = 24, op = pipe, loc = skgpspawn2
puts forward some ideas and methods for to solve this problem
1,Check free memory and swap on the server.
2,Check the oracle account’s process limits?
$ ulimit -a
..
bug
Read previous article skgpspawn failed:category = 27143 in alert log
The following is Step by step diagnostic methods in the case
DB env:
2 nodes 9i(9.2.0.4) rac on hp-ux
in alert.
###########################
skgpspawn failed:category = 27143, depinfo = 24, op = pipe, loc = skgpspawn2 skgpspawn failed:category = 27143, depinfo = 24, op = pipe, loc = skgpspawn2 skgpspawn failed:category = 27143, depinfo = 24, op = pipe, loc = skgpspawn2 skgpspawn failed:category = 27143, depinfo = 24, op = pipe, loc = skgpspawn2 skgpspawn failed:category = 27143, depinfo = 24, op = pipe, loc = skgpspawn2 skgpspawn failed:category = 27143, depinfo = 24, op = pipe, loc = skgpspawn2
# swapinfo -atm
when after cleaned up manually 500+ OS processes with OS kill -9 command, above output show total line PCT used 73% on the node.
when I try to find out difference processes between v$process and v$session using following SQL:
sql> select spid,username,program from v$process where addr not in(select paddr from v$session); spid username program ------ -------- -------------------------- 17062 oracle oracle@par3a (TNS V1-V3) 16711 oracle oracle@par3a (TNS V1-V3) 17064 oracle oracle@par3a (TNS V1-V3) ... 100+ process.
NOTE:
I not means number of processes always equal number of sessions, and Tom said “a process can have many sessions. a session may or may not have a process.” About Difference between Session and Process can read Arup Nanda’ note
Next I try to check what is program “oracle@par3a (TNS V1-V3)” from v$session view.
sql> select USERNAME ,STATUS ,SERVER,OSUSER,PROCESS,MACHINE,TERMINAL,PROGRAM from v$session where program like 'oracle@par3a%' USERNAME STATUS SERVER OSUSER PROCESS MACHINE TERMINAL PROGRAM ---------------------------------------------------------------------------------------------- dbuser01 INACTIVE DEDICATED Administrator 1909036 par3a oracle@par3a (TNS V1-V3) dbuser01 INACTIVE DEDICATED Administrator 1650968 par3a oracle@par3a (TNS V1-V3) dbuser01 INACTIVE DEDICATED Administrator 2146628 par3a oracle@par3a (TNS V1-V3) dbuser01 INACTIVE DEDICATED Administrator 2601276 par3a oracle@par3a (TNS V1-V3) dbuser01 INACTIVE DEDICATED Administrator 3330508 par3a oracle@par3a (TNS V1-V3) dbuser01 INACTIVE DEDICATED Administrator 262624 par3a oracle@par3a (TNS V1-V3)
Tip:
Upon inquiry is par3a machines(10g rac) extract data use database link with a application from the database. I used these sql script to make sure who is using db_link to conect me .
then I check user’s resource limit with user profile .
SQL>show parameter resource_limit --------- true SQL>select profile from dba_users where username='dbuser01'; ---------- IDLE_TIME_PROFILE SQL>select resource_name ,limit from dba_profiles where profile='IDLE_TIME_PROFILE' resource_name limit -------------- -------------- IDLE_TIME 90 ...
TIP:
The user had 1.5 hours IDLE_TIME resource limit, RESOURCE_LIMIT Valid values for the parameter are TRUE (enables enforcement) and FALSE. By default, this parameter’s value is set to FALSE IN 9I.
IDLE_TIME Specify the permitted periods of continuous inactive time during a session, expressed in minutes. Long-running queries and other operations are not subject to this limit.
“When a resource limit is exceeded (for example IDLE_TIME) … PMON does the following
* Mark the V$SESSION as SNIPED
* Clean up the database resources for the session
* Remove the V$SESSION entry(no v$process) When a resource limit is exceeded (for example IDLE_TIME) …
PMON marks the session as SNIPED in V$SESSION. Then, AFTER the SNIPED session tries to execute any SQL statement, its database resources are cleaned up and its V$SESSION entry is removed.”
sql> SELECT USERNAME,MACHINE,PROGRAM FROM V$SESSION WHERE STATUS=’SNIPED’;
It’s output show many connects of username is ‘dbuser01’ and program is ‘oracle@par3a (TNS V1-V3)’;
Tip:
SNIPED – An inactive session that has exceeded some configured limits (for example, resource limits specified for the resource manager consumer group or idle_time specified in the user’s profile). Such sessions will not be allowed to become active again.
It is strongly recommended that both DCD and Resource Limits with Profiles be implemented in order to clean up resources at both the database and OS level
This combination will not clean up IDLE / ABANDONED / INACTIVE connections (OS processes) as these sessions still have active clients
For this case we will see that :
* PMON has cleaned up the V$SESSION entries .. but both the OS processes and the V$PROCESS entries will still exist
* SQLNET will continue to be able to send the 10 byte packet successfully until the session is logged off
This condition can be a major problem as
* The database exhausts PROCESSES and gives ORA-20 maximum number of processes <num> exceeded
* The OS can become exhausted due to the unneeded resources consumed by the abandoned processes
The SYMPTOMS of this condition are
* The database view V$PROCESS will have no corresponding V$SESSION entry
* An OS process / thread still exists for the SNIPED session
The solutions to this scenario can are to cleanup the OS processes … after which the V$PROCESS entries should be removed automatically
Cause
Summary of what was discussed here:
1) DCD initiates clean up of OS and database processes that have disconnected / terminated abnormally
2) DCD will not initiate clean up sessions that are still connected … but are idle / abandoned / inactive
3) Database Resource Limits + user Profiles clean up database resources for user sessions that exceed resource limits
4) Database Resource Limits + user Profiles will not clean up OS processes
5) If DCD and Database Resource Limits + user Profiles are used in combination .. Dead Connections OS and Database Resources will be cleaned up
6) IDLE / ABANDONED / INACTIVE sessions OS processes will not be cleaned up even if DCD and Database Resource Limits + user Profiles are used in combination … these must be cleaned up manually
Then I try to dump a OS process id 17062 not in v$session processstat .
oradebug setospid 17062
oradebug dump processstate 10
oradebug tracefile_name
/u01/app/oracle/admin/dbzy/udump/dbzy1_ora_17062.trc
Oracle9i Enterprise Edition Release 9.2.0.4.0 – 64bit Production
With the Partitioning, Real Application Clusters, OLAP and Oracle Data Mining options
JServer Release 9.2.0.4.0 – Production
ORACLE_HOME = /u01/app/oracle/product/9.2.0
System name: HP-UX
Node name: hbdbzy1
Release: B.11.23
Version: U
Machine: ia64
Instance name: dbzy1
Redo thread mounted by this instance: 1
Oracle process number: 34
Unix process pid: 17062, image: oracle@hbdbzy1 (TNS V1-V3)
*** 2014-08-26 14:06:50.741
*** SESSION ID:(17.45) 2014-08-26 14:06:50.709
===================================================
PROCESS STATE
————-
Process global information:
process: c00000044dad8440, call: 0000000000000000, xact: 0000000000000000, curses: 0000000000000000, usrses: 0000000000000000
—————————————-
SO: c00000044dad8440, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00
(process) Oracle pid=34, calls cur/top: 0000000000000000/c00000045d55c6d0, flag: (0) –
int error: 0, call error: 0, sess error: 0, txn error 0
(post info) last post received: 0 0 0
last post received-location: No post
last process to post me: none
last post sent: 0 0 0
last post sent-location: No post
last process posted by me: none
(latch info) wait_event=0 bits=0
Process Group: DEFAULT, pseudo proc: c00000044dc08080
O/S info: user: oracle, term: UNKNOWN, ospid: 17062
OSD pid info: Unix process pid: 17062, image: oracle@hbdbzy1 (TNS V1-V3)
—————————————-
SO: c00000044eebff48, type: 13, owner: c00000044dad8440, flag: INIT/-/-/0x00
(broadcast handle) flag: (2) ACTIVE SUBSCRIBER, owner: c00000044dad8440,
event: 216, last message event: 216,
last message waited event: 216, messages read: 0
channel: (c00000044eefbb40) system events broadcast channel
scope: 0, event: 43381, last mesage event: 0,
publishers/subscribers: 0/154,
messages published: 0
—————————————-
SO: c000000450613800, type: 19, owner: c00000044dad8440, flag: INIT/-/-/0x00
GES MSG BUFFERS: st=emp chunk=0x0000000000000000 hdr=0x0000000000000000 lnk=0x0000000000000000 flags=0x0
outq=0 sndq=0 opid=0 prmb=0x0
mbg[i]=(0 0) mbg[b]=(0 0) mbg[r]=(0 0)
fmq[i]=(0 0) fmq[b]=(0 0) fmq[r]=(0 0)
mop[s]=0 mop[q]=0 pendq=0 workq=0
————process———————
proc version : 6
Local node : 0
gid : 0
pid : 17062
lkp_node : 0
proc state : KJP_FROZEN –Frozen process
Total accesses : 55
Imm. accesses : 54
Locks on ASTQ : 0
Locks Pending AST : 0
Granted locks : 0
AST_Q :NULL
PENDING_Q :NULL
GRANTED_Q :NULL
AST_Q:
PENDING_Q:
GRANTED_Q:
—————————————-
SO: c00000045d55c6d0, type: 3, owner: c00000044dad8440, flag: INIT/-/-/0x00
(call) sess: cur 0, rec 0, usr 0; depth: 0
END OF PROCESS STATE
References:
A discussion of Dead Connection Detection, Resource Limits, V$SESSION, V$PROCESS and OS processes ( ID 601605.1)
目前这篇文章有1条评论(Rss)评论关闭。