How to release still “killed“ status session in v$session? (释放killed的session) (四)
How to release still “killed“ status session in v$session? (释放killed的session) (一)
How to release still “killed“ status session in v$session? (释放killed的session) (二)
How to release still “killed“ status session in v$session? (释放killed的session) (三)
继续一个killed无法释放的案例, 环境是11.2.0.4 2nodes-RAC on SELS11, 开始是数据库突然出现了非常高的负载wait event是“enq: SV – contentio”,“row cache lock”,“enq: SQ – contention ”, 是与sequnece相关,SQ较为常见不再描述, 当多实例争用sequence时,如果sequence是ORDER and NOCACHE主要的wait event 是row cache lock, 当sequence是cached通常出现的是SV。 因为这个事件KILL了一批进程,后期发现有个进程始终是KILLED状态,并且同时影响后期的AWR都无法生成。
SQL>@ase USERNAME SID EVENT MACHINE MODULE STATUS LAST_CALL_ET SQL_ID WAI_SECINW ROW_WAIT_OBJ# SQLTEXT BS CH# OSUSER HEX ---------- ---------- -------------------- ---------- -------------------- -------- ------------ --------------- ---------- ------------- ------------------------------ ---------- ---- ---------- --------- PATROLDB 337 library cache: mutex kdechan1 SQL*Plus ACTIVE 18858 2kxmfvbpdkbj4 0:469 663378 select 'DATAPOINT ' bpb, sql_t 1:4851 0 itmuser 1000000 NGESHOP 4851 On CPU / runqueue qdtza1 oracle KILLED 95113 axyfr1t8k6xy7 -1:95111 128735 : grid SQL> @usid 4851 USERNAME SID AUDSID OSUSER MACHINE PROGRAM SPID OPID CPID SQL_ID HASH_VALUE LASTCALL STATUS SADDR PADDR TADDR LOGON_TIME ----------------------- -------------- ----------- ---------------- ------------------ -------------------- -------------- ------ ------------------------ --------------- ----------- ---------- ------ -- ---------------- ---------------- ---------------- ----------------- NGESHOP '4851,22847' 497193017 grid qdtza1 (TNS V1-V3) 33793 192 21244 axyfr1t8k6xy7 1361278919 20965 KILLED 0000000B31C705F0 0000000B5124B188 0000000B2AB94178 20190123 07:00:54
Note:
当前SESSION id 4851是KILLED状态,并且堵塞了其它会话。当然根据会话信息我们确认了客户端是一个dblink调用。
SQL> @st dba_2pc_pending no rows selected SQL> SELECT /*+ ORDERED */ 2 SUBSTR (s.ksusemnm, 1, 10) || '-' || SUBSTR (s.ksusepid, 1, 10) 3 "ORIGIN", 4 SUBSTR (g.K2GTITID_ORA, 1, 35) "GTXID", 5 SUBSTR (s.indx, 1, 4) || '.' || SUBSTR (s.ksuseser, 1, 5) "LSESSION", 4 SUBSTR (g.K2GTITID_ORA, 1, 35) "GTXID", 5 SUBSTR (s.indx, 1, 4) || '.' || SUBSTR (s.ksuseser, 1, 5) "LSESSION", 6 s2.username, 7 SUBSTR ( 8 DECODE ( 9 BITAND (ksuseidl, 11), 10 1, 'ACTIVE', 11 0, DECODE (BITAND (ksuseflg, 4096), 0, 'INACTIVE', 'CACHED'), 12 2, 'SNIPED', 13 3, 'SNIPED', 14 'KILLED'), 15 1, 16 10) 17 "Status", 18 SUBSTR (s2.event, 1, 10) "WAITING" 19 FROM x$k2gte g, 20 x$ktcxb t, 21 x$ksuse s, 22 v$session s2 23 WHERE 24 g.K2GTDXCB = t.ktcxbxba 25 AND g.K2GTDSES = t.ktcxbses 26 AND s.addr = g.K2GTDSES 27 AND s2.sid = s.indx; ORIGIN GTXID LSESSION USERNAME Status WAITING ----------------------------------------- ---------------------------------------------------------------------- ------------------- ---------- ---------------- -------------------- qdtza1-21244 UCISA.HEBEI.MOBILE.COM.53c50444.361 4851.22847 NGESHOP KILLED SQL*Net me
# Pmon trace
*** 2019-01-23 18:10:42.356 KGX Atomic Operation Log 0xb97f5f4b8 Mutex 0xb68781710(4851, 0) idn 565cf76d oper GET_EXCL(5) Library Cache uid 303 efd 9 whr 76 slp 12366 oper=0 pt1=0xb687815d0 pt2=(nil) pt3=(nil) pt4=(nil) pt5=(nil) ub4=0 *** 2019-01-23 18:10:52.371 found process 0xb5124b188 pid=192 serial=42 ospid = 33793 dead KGX cleanup... KGX Atomic Operation Log 0x2a4ed9008 Mutex 0xb68781710(4851, 0) idn 565cf76d oper EXCL(6) Library Cache uid 4851 efd 13 whr 98 slp 0 oper=12 pt1=0xb687815d0 pt2=0x197f140b0 pt3=(nil) pt4=(nil) pt5=(nil) ub4=0 LibraryHandle: Address=0xb687815d0 Hash=565cf76d LockMode=0 PinMode=0 LoadLockMode=0 Status=VALD ObjectName: Name=SELECT COUNT(*) FROM "NGESHOP"."ES_ORDER_*********" "A1" WHERE "A1"."ORDER_ID"='aaf3ea904147492fbd5be1dd14d096b8' FullHashValue=243baddfd5dc6ffdb59840a3565cf76d Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=1448933229 OwnerIdn=246 *** 2019-01-23 18:13:22.492 KGX cleanup... KGX Atomic Operation Log 0xb97f5f4b8 Mutex 0xb68781710(4851, 0) idn 565cf76d oper GET_EXCL(5) Library Cache uid 303 efd 9 whr 76 slp 12407 oper=0 pt1=0xb687815d0 pt2=(nil) pt3=(nil) pt4=(nil) pt5=(nil) ub4=0 *** 2019-01-23 18:13:32.504 found process 0xb5124b188 pid=192 serial=42 ospid = 33793 dead KGX cleanup... KGX Atomic Operation Log 0x2a4ed9008 Mutex 0xb68781710(4851, 0) idn 565cf76d oper EXCL(6) Library Cache uid 4851 efd 13 whr 98 slp 0 oper=12 pt1=0xb687815d0 pt2=0x197f140b0 pt3=(nil) pt4=(nil) pt5=(nil) ub4=0 LibraryHandle: Address=0xb687815d0 Hash=565cf76d LockMode=0 PinMode=0 LoadLockMode=0 Status=VALD ObjectName: Name=SELECT COUNT(*) FROM "NGESHOP"."ES_ORDER_*********" "A1" WHERE "A1"."ORDER_ID"='aaf3ea904147492fbd5be1dd14d096b8' FullHashValue=243baddfd5dc6ffdb59840a3565cf76d Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=1448933229 OwnerIdn=246 *** 2019-01-23 18:16:02.605 KGX cleanup... KGX Atomic Operation Log 0xb97f5f4b8 Mutex 0xb68781710(4851, 0) idn 565cf76d oper GET_EXCL(5) Library Cache uid 303 efd 9 whr 76 slp 12330 oper=0 pt1=0xb687815d0 pt2=(nil) pt3=(nil) pt4=(nil) pt5=(nil) ub4=0 *** 2019-01-23 18:16:12.620 found process 0xb5124b188 pid=192 serial=42 ospid = 33793 dead KGX cleanup... KGX Atomic Operation Log 0x2a4ed9008 Mutex 0xb68781710(4851, 0) idn 565cf76d oper EXCL(6) Library Cache uid 4851 efd 13 whr 98 slp 0 oper=12 pt1=0xb687815d0 pt2=0x197f140b0 pt3=(nil) pt4=(nil) pt5=(nil) ub4=0 LibraryHandle: Address=0xb687815d0 Hash=565cf76d LockMode=0 PinMode=0 LoadLockMode=0 Status=VALD ObjectName: Name=SELECT COUNT(*) FROM "NGESHOP"."ES_ORDER_*********" "A1" WHERE "A1"."ORDER_ID"='aaf3ea904147492fbd5be1dd14d096b8' FullHashValue=243baddfd5dc6ffdb59840a3565cf76d Namespace=SQL AREA(00) Type=CURSOR(00) Identifier=1448933229 OwnerIdn=246
# hanganalyze trace
------------------------------------------------------------------------------- Chain 1: ------------------------------------------------------------------------------- Oracle session identified by: { instance: 1 (echandb.echandb1) os id: 29849 process id: 2, oracle@kdechan1 (PMON) session id: 303 session serial #: 1 } is waiting for 'library cache: mutex X' with wait info: { p1: 'idn'=0x565cf76d p2: 'value'=0x12f300000000 p3: 'where'=0x4c time in wait: 21.160256 sec timeout after: never wait id: 19282212 blocking: 0 sessions wait history: * time between current wait and wait #1: 0.021533 sec 1. event: 'pmon timer' time waited: 10.009346 sec wait id: 19282211 p1: 'duration'=0x3e8 * time between wait #1 and #2: 0.012146 sec 2. event: 'library cache: mutex X' time waited: 2 min 29 sec wait id: 19282210 p1: 'idn'=0x565cf76d p2: 'value'=0x12f300000000 p3: 'where'=0x4c * time between wait #2 and #3: 0.021147 sec 3. event: 'pmon timer' time waited: 10.010484 sec wait id: 19282209 p1: 'duration'=0x3e8 } and is blocked by => Oracle session identified by: { instance: 1 (echandb.echandb1) os id: 33793 (DEAD) process id: 192, oracle@kdechan1 session id: 4851 session serial #: 22847 } which is not in a wait: { last wait: 202 min 18 sec ago blocking: 4 sessions wait history: 1. event: 'SQL*Net message from client' time waited: 0.001799 sec wait id: 13552700 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 * time between wait #1 and #2: 0.000010 sec 2. event: 'SQL*Net message to client' time waited: 0.000002 sec wait id: 13552699 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 * time between wait #2 and #3: 0.000240 sec 3. event: 'SQL*Net message from client' time waited: 0.002142 sec wait id: 13552698 p1: 'driver id'=0x54435000 p2: '#bytes'=0x1 } Chain 1 Signature: <not in a wait><='library cache: mutex X' Chain 1 Signature Hash: 0x6d47f496
# systemstate dump trace
PROCESS 192:
—————————————-
SO: 0xb5124b188, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0xb5124b188, name=process, file=ksu.h LINE:12721, pg=0
(process) Oracle pid:192, ser:42, calls cur/top: 0xadbd83798/0xadbd83798
flags : (0x1) DEAD
flags2: (0x8000), flags3: (0x10)
intr error: -2147483620, call error: 0, sess error: 0, txn error 0
intr queue: 2147483676 2147483676 2147483676
ksudlp FALSE at location: 0
Cleanup details:
Found dead = 244 min 37 sec ago
Total Cleanup attempts = 99, Cleanup time = 235 min 34 sec, Cleanup timer = 235 min 30 sec
Last attempt (full) started 11 sec ago, Length = in progress
(post info) last post received: 0 0 27
last post received-location: ksa2.h LINE:289 ID:ksasnr
last process to post me: 0xb5125cdc0 65 6
last post sent: 0 0 26
last post sent-location: ksa2.h LINE:285 ID:ksasnd
last process posted by me: 0xb5125cdc0 65 6
(latch info) wait_event=0 bits=0x0
Process Group: DEFAULT, pseudo proc: 0xb993e85d0
skgvtime: process 33793 unix pid wrap detected 1496387522 1499482369
O/S info: user: grid, term: UNKNOWN, ospid: 33793 (DEAD)
OSD pid info: Unix process pid: 33793, image: oracle@kdechan1
Short stack dump: ORA-00072: process “Unix process pid: 33793, image: oracle@kdechan1” is not active
PROCESS 2: PMON
—————————————-
SO: 0xb591b8490, type: 2, owner: (nil), flag: INIT/-/-/0x00 if: 0x3 c: 0x3
proc=0xb591b8490, name=process, file=ksu.h LINE:12721, pg=0
(process) Oracle pid:2, ser:1, calls cur/top: 0xb59e5a9d8/0xb59e5a9d8
flags : (0xe) SYSTEM
flags2: (0x0), flags3: (0x10)
intr error: 0, call error: 0, sess error: 0, txn error 0
intr queue: empty
ksudlp FALSE at location: 0
(post info) last post received: 0 0 16
last post received-location: ksu.h LINE:14056 ID:ksupsc
last process to post me: 0xb591efaa8 209 0
last post sent: 0 0 38
last post sent-location: ksr2.h LINE:641 ID:ksrchdelete
last process posted by me: 0xb39189cc0 2 6
(latch info) wait_event=0 bits=0x0
Process Group: DEFAULT, pseudo proc: 0xb993e85d0
O/S info: user: oracle, term: UNKNOWN, ospid: 29849
OSD pid info: Unix process pid: 29849, image: oracle@kdechan1 (PMON)
Short stack dump:
ksedsts()+465<-ksdxfstk()+32<-ksdxcb()+1927<-sspuser()+112<-__sighandler()<-semtimedop()+10<-skgpwwait()+178
<-ksliwat()+2022<-kslwaitctx()+163<-ksfwaitctx()+14<-kgxWait()+1330<-kgxExclusive()+447<-kglGetMutex()+212<-kglobfr()+1010<-kgllccl()+1657
<-kglMutexRecovery()+661<-kgxCleanup()+548<-kglMutexCleanupAll()+437<-kglMutexCleanup()+13<-kksCleanSessionState()
+232<-ksudlp()+202<-kssxdl()+489<-kssdel()+644<-ksuxdl()+398<-ksuxda_del_proc()+166<-ksucln_dpc_cleanup()+259<-ksucln_dpc_dfs()
+248<-ksucln_dpc_main()+1188<-ksucln_dpc()+29<-ksucln()+1250<-ksbrdp()+1045<-opirip()+623<-opidrv()+603<-sou2o()+103
<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+230
当前看到是在清理一个分布式事务的dead 会话时失败,根据Pmon的call stack在MOS中检索出BUG。
AWR or ADDM Report Generation Hangs Waiting on ‘library cache:mutex X’ Event ( Doc ID 1960432.1 )
Bug 13542050 : USE OF KGL MUTEXES MIGHT BLOCK ON BOGUS MUTEX HOLDER.
Bug 13542050 – A mutex related hang with holder around 65534 (0xfffe) – superseded ( Doc ID 13542050.8 )
Note that this fix has been superseded by the fix in Bug:24739928
Symptoms:
Hang (Involving Shared Resource)
Mutex Contention
Waits for “cursor: pin X”
Waits for “library cache: mutex X”
Stack is likely to include kgxWait
最终也是重启instance 释放些killed session.
对不起,这篇文章暂时关闭评论。