Troubleshooting ora-12519 or ora-12516 ,listener service ‘blocked’, and wait event ‘latch: ges resource hash
[oracle@kdexa1db01 (orarpt1)~]$ tnsping anbob2 TNS Ping Utility for Linux: Version 11.2.0.3.0 - Production on 05-JUN-2015 10:39:14 Copyright (c) 1997, 2011, Oracle. All rights reserved. Used parameter files: Used TNSNAMES adapter to resolve the alias Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = 133.96.60.16)(PORT = 1521)) (CONNECT_DATA = (SID = anbobc2))) OK (10 msec) [oracle@kdexa1db01 (orarpt1)~]$ sqlplus weejar@anbob2 SQL*Plus: Release 11.2.0.3.0 Production on Fri Jun 5 10:40:36 2015 Copyright (c) 1982, 2011, Oracle. All rights reserved. Enter password: ERROR: ORA-12519: TNS:no appropriate service handler found
TIP:
通常当出现ora-12519 or 12516时都是因为数据库进程数超过了数据库参数processes 或 sessions 时, 并且通常在db alert 中出现ora-20 or ora-18 的错误信息,如果当时查看监听服务状态使用lsnrctl service 会发现service 当时是”blocked”状态, listener 进程是有pmon 进程在不断的获得DB 的最新负载及连接数和db 的processes parameter, pmon 的更新前不是每秒,通常在5s-10分钟之间,负载变化大时会自己加快更新频率。 listener自己知道会话增加的数量,但是当会话断开后listener不会立即知道, 也就是当listener 上的service 在”blocked”状态后,知道下次pmon 进程通知现在当前连接小于processes parameter 值时(当然sessions 也是限制,包括SQL解析前系统的递归会话),新的连接请求才能创建成功。 今天遇到了一个案例,这里分享一下我处理思路。 希望对你有用。
先观察问题当时的数据库资源使用情况
SQL> select * from v$resource_limit; RESOURCE_NAME CURRENT_UTILIZATION MAX_UTILIZATION INITIAL_AL LIMIT_VALU ------------------------------ ------------------- --------------- ---------- ---------- processes 1057 1247 4000 4000 <<< sessions 1093 1249 4405 4405 <<< enqueue_locks 21693 53190 53190 53190 enqueue_resources 780 1039 20424 UNLIMITED ges_procs 1056 1245 4001 4001 ges_ress 0 0 82503 UNLIMITED ges_locks 0 0 128193 UNLIMITED ges_cache_ress 7417 95371 0 UNLIMITED ges_reg_msgs 1223 2996 8750 UNLIMITED ges_big_msgs 245 442 1934 UNLIMITED ges_rsv_msgs 0 0 1000 1000 gcs_resources 1400382 2253096 2421479 2421479 gcs_shadows 726267 1520777 2421479 2421479 dml_locks 126 2469 19380 UNLIMITED temporary_table_locks 8 352 UNLIMITED UNLIMITED transactions 16 2472 4845 UNLIMITED branches 37 136 4845 UNLIMITED cmtcallbk 0 3 4845 UNLIMITED sort_segment_locks 106 125 UNLIMITED UNLIMITED max_rollback_segments 170 185 4845 65535 max_shared_servers 0 0 UNLIMITED UNLIMITED parallel_max_servers 26 42 40 3600
TIP:
当时的数据库进程和会话数实际并未达到processes parameter 的50%,使用count v$session也可以确认。那我们可以在DB本机绕过LISTENER,尝试创建一个新的连接是否成功?
[oracle:/home/oracle]# sqlplus weejar/weejar_dba123 SQL*Plus: Release 10.2.0.5.0 - Production on Fri Jun 5 10:45:01 2015 Copyright (c) 1982, 2010, Oracle. All Rights Reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options SQL> exit Disconnected from Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options [oracle:/home/oracle]# sqlplus weejar/weejar_dba123@133.96.60.16:1521/anbobc.com SQL*Plus: Release 10.2.0.5.0 - Production on Fri Jun 5 10:45:20 2015 Copyright (c) 1982, 2010, Oracle. All Rights Reserved. ERROR: ORA-12516: TNS:listener could not find available handler with matching protocol stack
NOTE:
通过上面尝试可以确认,数据库实际并发达到最大连接数限制, 连接时ora-125xx错误应该是 listener上的限制,下面查看监听状态。
[oracle:/home/oracle]# ps -ef|grep lsnr grid 8519706 1 0 May 15 - 17:08 /oracle/app/11.2.0.3/grid/bin/tnslsnr LISTENER -inherit oracle 26411174 34931032 0 10:45:30 pts/2 0:00 grep lsnr grid 12058930 1 0 May 15 - 0:32 /oracle/app/11.2.0.3/grid/bin/tnslsnr LISTENER_SCAN1 -inherit [oracle:/home/oracle]# /oracle/app/11.2.0.3/grid/bin/lsnrctl status LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 05-JUN-2015 10:45:45 Copyright (c) 1991, 2011, Oracle. All rights reserved. Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521)) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production Start Date 15-MAY-2015 02:26:10 Uptime 21 days 8 hr. 19 min. 36 sec Trace Level off Security ON: Local OS Authentication SNMP ON Listener Parameter File /oracle/app/11.2.0.3/grid/network/admin/listener.ora Listener Log File /oracle/app/grid/diag/tnslsnr/anbob2/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.16)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.116)(PORT=1521))) Services Summary... Service "anbobc.com" has 1 instance(s). Instance "anbobc2", status READY, has 1 handler(s) for this service... Service "anbobc_XPT.com" has 1 instance(s). Instance "anbobc2", status READY, has 1 handler(s) for this service... The command completed successfully [oracle:/home/oracle]# /oracle/app/11.2.0.3/grid/bin/lsnrctl service LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 05-JUN-2015 10:45:55 Copyright (c) 1991, 2011, Oracle. All rights reserved. Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521)) Services Summary... Service "anbobc.com" has 1 instance(s). Instance "anbobc2", status READY, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:190741 refused:0 state:blocked <<<<<<<< LOCAL SERVER Service "anbobc_XPT.com" has 1 instance(s). Instance "anbobc2", status READY, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:190741 refused:0 state:blocked LOCAL SERVER The command completed successfully
TIP:
这个库是10.2.0.5 的DB, 使用的是11.2.0.3的GI, 监听在grid 的用户下。 当时的服务在监听上,只是service 的状态已经是”blocked”, 当然有可能是主机资源问题导致无法创建新连接,在早前的版本中还有可能是listener.log文件的过大,这两个原因都已排除。
+-topas_nmon--l=LongTerm-CPU-----Host=anbob2---------Refresh=2 secs---10:51.45--------------- | Memory ------------------------------------------------------------------------------------ | Physical PageSpace | pages/sec In Out | FileSystemCache |% Used 58.3% 0.3% | to Paging Space 0.0 0.0 | (numperm) 9.7% |% Free 41.7% 99.7% | to File System 0.0 1.5 | Process 42.9% |GB Used 70.0GB 0.2GB | Page Scans 0.0 | System 5.7% |GB Free 50.0GB 63.8GB | Page Cycles 0.0 | Free 41.7% |Total(GB) 120.0GB 64.0GB | Page Steals 0.0 | ------ | | Page Faults 8280.1 | Total 100.0% |------------------------------------------------------------ | numclient 9.7% |Min/Maxperm 3574MB( 3%) 23827MB( 20%) <--% of RAM | maxclient 10.0% |Min/Maxfree 960 1088 Total Virtual 184.0GB | User 49.6% |Min/Maxpgahead 2 8 Accessed Virtual 57.9GB 31.4%| Pinned 8.6% | 20.0 | lruable pages 304991
下面查看CRS资源状态。
anbob2:/home/grid> crsctl stat res -t -------------------------------------------------------------------------------- NAME TARGET STATE SERVER STATE_DETAILS -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.LISTENER.lsnr ONLINE ONLINE anbob1 ONLINE ONLINE anbob2 ora.asm OFFLINE OFFLINE anbob1 OFFLINE OFFLINE anbob2 ora.gsd OFFLINE OFFLINE anbob1 OFFLINE OFFLINE anbob2 ora.net1.network ONLINE ONLINE anbob1 ONLINE ONLINE anbob2 ora.ons ONLINE ONLINE anbob1 ONLINE ONLINE anbob2 ora.registry.acfs ONLINE OFFLINE anbob1 ONLINE OFFLINE anbob2 -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE anbob2 ora.cvu 1 ONLINE ONLINE anbob2 ora.anbob1.vip 1 ONLINE ONLINE anbob1 ora.anbob2.vip 1 ONLINE ONLINE anbob2 ora.oc4j 1 ONLINE ONLINE anbob2 ora.scan1.vip 1 ONLINE ONLINE anbob2
TIP:
现在时间紧迫,前先需要恢复连接, 此时的节点1还好是正常,新的连接被转移到了节点2, 现在可以猜测有可能是listener.ora 上的连接数信息不正确, 尝试重启listener
su - grid srvctl stop listener -n anbob2 srvctl start listener -n anbob2 su - oracle alter system register;
NOTE:
再次查看listener的状态,发现服务没有注册上,数据库登录尝试手动注册也未成功,包括使用tnsnames.ora + local_listener的方式尝试也未成功(default port:1521 not need),动态注册不成功,我们放弃这种方式,使用静态注册,注册11g的监听和以前版本的区别,这里我使用了netmgr图形化操作,操作细节未记录。静态注册成功后,再次重启监听,下面再尝试连接。
[oracle@anbob2:/home/oracle]#/oracle/app/11.2.0.3/grid/bin/lsnrctl status LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 05-JUN-2015 17:21:10 Copyright (c) 1991, 2011, Oracle. All rights reserved. Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521)) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production Start Date 05-JUN-2015 12:07:11 Uptime 0 days 5 hr. 13 min. 59 sec Trace Level off Security ON: Local OS Authentication SNMP ON Listener Parameter File /oracle/app/11.2.0.3/grid/network/admin/listener.ora Listener Log File /oracle/app/grid/diag/tnslsnr/anbob2/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.16)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.116)(PORT=1521))) Services Summary... Service "anbobc.com" has 2 instance(s). Instance "anbobc2", status UNKNOWN, has 1 handler(s) for this service... # sqlplus weejar/weejar_dba123@133.96.60.16:1521/anbobc.com SQL*Plus: Release 10.2.0.5.0 - Production on Fri Jun 5 13:28:42 2015 Copyright (c) 1982, 2010, Oracle. All Rights Reserved. Connected to: Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bit Production With the Partitioning, Real Application Clusters, OLAP, Data Mining and Real Application Testing options SQL>
NOTE:
注意到这里新的连接已经成功。松了口气, 下面开始分析当时的原因,为什么监听连接数会不正确?为什么动态注册service不能注册到Listener上?这些从我任务从一开始我们就有说过是有pmon完成的。下面排查是不是pmon Background process 出了问题。
# DB alert no ora-18 and ora-20 Fri Jun 05 10:45:46 BEIST 2015 PMON failed to acquire latch, see PMON dump Fri Jun 05 10:54:38 BEIST 2015 # pmon trace *** 2015-06-05 10:45:46.237 PMON unable to acquire latch 700000ad1a9ee98 Child ges resource hash list level=1 child#=1254 Location from where latch is held: kjrmas1: lookup master node: Context saved from call: 504403196568791072 state=busy, wlstate=free waiters [orapid (seconds since: put on list, posted, alive check)]: 1064 (0, 1433476592, 0) 1051 (0, 1433476592, 0) 1058 (0, 1433476592, 0) 1048 (0, 1433476592, 0) 1050 (0, 1433476592, 0) 1055 (0, 1433476592, 0) 1031 (0, 1433476592, 0) waiter count=7 gotten 20376414 times wait, failed first 10108322 sleeps 15878606 <<<<<<<<<<<<<<<<<< gotten 11598 times nowait, failed: 59605 recovery area: Dump of memory from 0x0700000AD021B280 to 0x0700000AD021B280 possible holder pid = 1066 ospid=46072128 <<<<<<<<<<<<<<<<<<<<< ---------------------------------------- SO: 700000ad571b5e8, type: 2, owner: 0, flag: INIT/-/-/0x00 (process) Oracle pid=1066, calls cur/top: 700000a7756c988/700000a333d74a8, flag: (0) - int error: 0, call error: 0, sess error: 0, txn error 0 (post info) last post received: 0 0 55 last post received-location: kjata: wake up enqueue owner last process to post me: 700000ad86f6730 1 6 last post sent: 0 0 108 last post sent-location: kjmpost: post lmd last process posted by me: 700000ad86f6730 1 6 (latch info) wait_event=0 bits=2 holding (efd=16) 700000ad1a9ee98 Child ges resource hash list level=1 child#=1254 Location from where latch is held: kjrmas1: lookup master node: Context saved from call: 504403196568791072 state=busy, wlstate=free waiters [orapid (seconds since: put on list, posted, alive check)]: 1064 (0, 1433476592, 0) 1051 (0, 1433476592, 0) 1058 (0, 1433476592, 0) 1048 (0, 1433476592, 0) 1050 (0, 1433476592, 0) 1055 (0, 1433476592, 0) 1031 (0, 1433476592, 0) waiter count=7 Process Group: DEFAULT, pseudo proc: 700000ad47e8d68 O/S info: user: oracle, term: UNKNOWN, ospid: 46072128 OSD pid info: Unix process pid: 46072128, image: oracle@anbob2 (P015) Short stack dump: ksdxfstk+002c<-ksdxcb+0500<-sspuser+0074<-000048BC<-kjskchcv+00f0<-kjuocl+0750<-kjusuc+05d0<-ksipgetctx+04fc<-ksqcmi+1b0c<-ksqgtlctx+1214<-ksqgelctx+0358
TIP:
可以看到问题时段,果然pmon进程有些busy, 取不到latch ges resource hash list, blocker是spid=46072128,pid = 1066进程
[oracle@anbob2:/home/oracle]#ps -ef|grep 46072128 oracle 19071202 45220332 0 15:07:58 pts/1 0:00 grep 46072128 oracle 46072128 1 10 Jun 03 - 311:14 ora_p015_anbobc2 SQL> @uopid 1066 USERNAME SID AUDSID OSUSER MACHINE PROGRAM PID SPID SQL_HASH_VALUE STATUS ----------------------- -------------- ----------- ---------------- ------------------ -------------------- ---------- ------------ -------------- -------- REPORT '3609,36916' 262269818 report kybb1 (P015) 1066 46072128 1746131492 ACTIVE SQL> @usid 3609 USERNAME SID AUDSID OSUSER MACHINE PROGRAM SPID OPID CPID SQL_ID HASH_VALUE LASTCALL STATUS SADDR PADDR TADDR LOGON_TIME ----------------------- -------------- ----------- ---------------- ------------------ -------------------- -------------- ------ ------------ ------------- ----------- ---------- -------- ---------------- ---------------- ---------------- ----------------- REPORT '3609,36916' 262269818 report kybb1 (P015) 46072128 1066 46072128 8av2z2xn17qj4 1746131492 218808 ACTIVE 0700000AD4AA4250 0700000AD571B5E8 0700000A99E44CB8 20150603 02:24:40 SQL> @xi 8av2z2xn17qj4 % eXplain the execution plan for sqlid 8av2z2xn17qj4 child %... PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------------------- SQL_ID 8av2z2xn17qj4, child number 0 ------------------------------------- create table xxxx_315 nologging as select /*+use_hash(a b) parallel(a 8) no_index(a)*/ 20150602 cycle,315 region,subsid usernum,acctid,nvl(productgroup,'PrdGrpGlobal') productgroup,nvl(productid,'gl.g.nml') productid xxxxx --省略 Plan hash value: 2866634249 ------------------------------------------------------------------------------------------------------------ | Id | Operation | Name | E-Rows | Pstart| Pstop | OMem | 1Mem | Used-Mem | ------------------------------------------------------------------------------------------------------------ | 0 | CREATE TABLE STATEMENT | | | | | | | | | 1 | PX COORDINATOR | | | | | | | | | 2 | PX SEND QC (RANDOM) | :TQ10002 | 154M| | | | | | | 3 | LOAD AS SELECT | | | | | 256K| 256K| | |* 4 | HASH JOIN OUTER | | 154M| | | 2047M| 113M| | | 5 | PX RECEIVE | | 154M| | | | | | | 6 | PX SEND HASH | :TQ10000 | 154M| | | | | | | 7 | PX BLOCK ITERATOR | | 154M| 6 | 6 | | | | |* 8 | TABLE ACCESS FULL| xxxx_315 | 154M| 6 | 6 | | | | | 9 | PX RECEIVE | | 1 | | | | | | | 10 | PX SEND HASH | :TQ10001 | 1 | | | | | | | 11 | PX BLOCK ITERATOR | | 1 | 34 | 34 | | | | |* 12 | TABLE ACCESS FULL| SUBSCRIBER | 1 | 34 | 34 | | | | ------------------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 4 - access("XXXX"."REGION"="SUBSCRIBER"."REGION" AND "XXXX"."SUBSID"="SUBSCRIBER"."OID") 8 - access(:Z>=:Z AND :Z<=:Z) filter(("REGION"=315 AND "WRTOFFSN"=0)) 12 - access(:Z>=:Z AND :Z<=:Z) filter(("CYCLE"=201505 AND "REGION"=315)) SQL> @px Show current Parallel Execution sessions in RAC cluster... QC_SID QCINST_ID USERNAME SQL_ID DEGREE REQ_DEGREE SLAVES INST_CNT MIN_INST MAX_INST ------------- ---------- ---------- ------------------ ---------- ---------- ---------- ---------- ---------- ---------- 3864,47259 2 REPORT 8av2z2xn17qj4 8 8 16 1 2 2 3864,47259 2 REPORT 19gg34w9r4r5m 8 8 8 1 2 2 3560,39391 2 SYS 6643dd2jtv7jh 2 2 2 2 1 2 SQL> @usid 3864 USERNAME SID AUDSID OSUSER MACHINE PROGRAM SPID OPID CPID SQL_ID HASH_VALUE LASTCALL STATUS SADDR PADDR TADDR LOGON_TIME ----------------------- -------------- ----------- ---------------- ------------------ -------------------- -------------- ------ ------------ ------------- ----------- ---------- -------- ---------------- ---------------- ---------------- ----------------- REPORT '3864,47259' 262269818 report kybb1 (TNS V1-V3) 27721980 544 27156 8av2z2xn17qj4 1746131492 220783 ACTIVE 0700000AD8AFDB78 0700000AD271DA38 0700000A9E8B74D8 20150603 02:22:13
查看当前的活动会话(FOREGROUND)
USERNAME USID MACHINE PROGRAM EVENT LAST_CALL_ET SQL_ID WT_SECINW SQLTEXT ---------- ---------- ---------- -------------- -------------------- ------------ ------------------ ---------- ------------------------------ ACCOUNT 4140 kdbdsc4 intf_accsrv@kd db file sequential r 0 4t9v2u4zdspxd 0:0 select a.scoretypeid,to_char(s ACCOUNT 3221 kdbdsc3 intf_accsrv@kd db file sequential r 0 4t9v2u4zdspxd 0:0 select a.scoretypeid,to_char(s USERINFO 3889 kycbe13 JDBC Thin Clie SQL*Net message from 0 33hk2fjysmd3w -1:0 update ENTITY_SYNC_LOG set sen REPORT 4158 kybb1 task@kybb1 (TN latch: ges resource 3723 2mk5dt5xjg7zt 0:0 INSERT INTO TB_BALANCE_FLOW_LO REPORT 3819 kybb1 task@kybb1 (TN latch: ges resource 3723 6n30d53a4ts94 0:0 insert into tb_collect_log_red REPORT 4110 kybb1 task@kybb1 (TN latch: ges resource 3723 6n30d53a4ts94 0:0 insert into tb_collect_log_red REPORT 4169 kybb1 task@kybb1 (TN latch: ges resource 3723 2mk5dt5xjg7zt 0:0 INSERT INTO TB_BALANCE_FLOW_LO REPORT 4042 kybb1 task@kybb1 (TN latch: ges resource 3848 acn50cgh015ba 0:0 insert into tb_user_charge_d REPORT 4191 kybb1 task@kybb1 (TN latch: ges resource 3848 d9hjdb4nkg18t 0:0 insert into tb_user_charge_d REPORT 3304 kybb1 task@kybb1 (TN latch: ges resource 139542 b5anr2qjt2h9k 2:0 insert /*+append*/into stat_ba REPORT 4002 kybb1 task@kybb1 (TN latch: ges resource 172083 cfyrg0hs1qh3a 0:0 UPDATE SUBSCRIBER S SET S.VIPI REPORT 3584 kybb1 oracle@anbob2 latch: ges resource 219101 8av2z2xn17qj4 0:0 create table xxxx_315 REPORT 3718 kybb1 oracle@anbob2 latch: ges resource 219101 8av2z2xn17qj4 0:0 create table xxxx_315 REPORT 3285 kybb1 oracle@anbob2 latch: ges resource 219101 8av2z2xn17qj4 0:0 create table xxxx_315 REPORT 3394 kybb1 oracle@anbob2 latch: ges resource 219101 8av2z2xn17qj4 0:0 create table xxxx_315 REPORT 3830 kybb1 oracle@anbob2 latch: ges resource 219101 8av2z2xn17qj4 0:0 create table xxxx_315 REPORT 3652 kybb1 oracle@anbob2 latch: ges resource 219101 8av2z2xn17qj4 0:0 create table xxxx_315 REPORT 3614 kybb1 oracle@anbob2 latch: ges resource 219101 8av2z2xn17qj4 27:0 create table xxxx_315 REPORT 3609 kybb1 oracle@anbob2 latch: ges resource 219101 8av2z2xn17qj4 0:0 create table xxxx_315 REPORT 3864 kybb1 task@kybb1 (TN PX Deq: Execute Repl 219248 8av2z2xn17qj4 0:0 create table xxxx_315 REPORT 3596 kybb1 oracle@anbob2 PX Deq: Execution Ms 219247 19gg34w9r4r5m 0:1 SELECT * FROM REGION_LIST WHER REPORT 3850 kybb1 oracle@anbob2 PX Deq: Execution Ms 219247 19gg34w9r4r5m 0:2 SELECT * FROM REGION_LIST WHER SYS 3560 anbob2 sqlplus@anbob2 SQL*Net message to c 3 d2b6tdq5jmju3 -1:3 select /*+rule*/ ses.use
Tip:
可以看到当前连接会话中latch ges resource的也就两个program,其中有个create table 的进程使用了8个parallel processes。而且这个会话已经持续执行了快22万秒,下面使用Poder的脚本查看是否有latch 级联阻塞。
SQL> @ash_wait_chains session_id||':'||program2||event2 session_type='FOREGROUND' sysdate-10/24/60 sysdate -- Display ASH Wait Chain Signatures script v0.2 BETA by Tanel Poder ( http://blog.tanelpoder.com ) %This SECONDS AAS WAIT_CHAIN ------ ---------- ---------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ 1% 599 1 -> 3761:(task) latch: ges resource hash list 1% 599 1 -> 3797:(task) ON CPU 1% 597 1 -> 3474:(intunit_get_sq) library cache lock 1% 597 1 -> 3869:(task) library cache lock 1% 597 1 -> 3303:(task) library cache lock 1% 574 1 -> 3261:(sqlplus) SQL*Net message from dblink 1% 527 .9 -> 4169:(task) latch: ges resource hash list 1% 524 .9 -> 3464:(sqlplus) SQL*Net message from dblink 1% 523 .9 -> 3718:(Pnnn) latch: ges resource hash list 1% 521 .9 -> 3652:(Pnnn) latch: ges resource hash list 1% 521 .9 -> 4002:(task) latch: ges resource hash list 1% 521 .9 -> 3584:(Pnnn) latch: ges resource hash list 1% 521 .9 -> 3304:(task) latch: ges resource hash list 1% 518 .9 -> 3830:(Pnnn) latch: ges resource hash list 1% 517 .9 -> 4299:(sqlplus) db file sequential read 1% 511 .9 -> 3394:(Pnnn) latch: ges resource hash list 1% 510 .9 -> 3609:(Pnnn) latch: ges resource hash list 1% 509 .8 -> 4191:(task) latch: ges resource hash list 1% 509 .8 -> 3780:(sqlplus) db file sequential read 1% 507 .8 -> 4110:(task) latch: ges resource hash list 1% 506 .8 -> 3614:(Pnnn) latch: ges resource hash list 1% 504 .8 -> 4158:(task) latch: ges resource hash list 1% 503 .8 -> 3301:(sqlplus) db file sequential read 1% 502 .8 -> 3895:(sqlplus) ON CPU 1% 499 .8 -> 3285:(Pnnn) latch: ges resource hash list 1% 497 .8 -> 3819:(task) latch: ges resource hash list 1% 496 .8 -> 4042:(task) latch: ges resource hash list 1% 471 .8 -> 4193:(sqlplus) SQL*Net message from dblink 1% 441 .7 -> 2975:(oracle) db file sequential read 1% 344 .6 -> 2928:(sqlplus) db file scattered read 30 rows selected. SQL> @uspid 3867094 USERNAME SID AUDSID OSUSER MACHINE PROGRAM SPID SQL_HASH_VALUE LASTCALL STATUS ----------------------- -------------- ----------- ---------------- ------------------ -------------------- ------------ -------------- ---------- -------- '4405,1' 0 oracle anbob2 oracle@anbob2 (PMON) 3867094 0 1865174 ACTIVE SQL> oradebug setospid 3867094 Oracle pid: 2, Unix process pid: 3867094, image: oracle@anbob2 (PMON) SQL> oradebug short_stack ksdxfstk+002c<-ksdxcb+0500<-sspuser+0074<-000048BC<-sskgpwwait+0034<-skgpwwait+00bc<-ksliwat+06c0<-kslwaitns_timed+0024< -kskthbwt+0280<-kslewat+01bc<-kslges+04b8<-kslgetl+0384<-kjuscl+0390<-ksiprls+0178<-ksqcmi+2a9c<-ksqrcl+051c<-ksqdeli+014c <-ksqdel+0014<-ksqsod+00d4<-kssxdl+0350<-kssdch_stage+05d4<-kssdch+0014<-ksudlc+01fc<-kssxdl+0350<-kssdch_stage+05d4<-kssdch+0014 <-ksudlc+01fc<-kssxdl+0350<-kssdch_stage+05d4<-kssdch+0014<-ksudlc+01fc<-kssxdl+0350<-ksudlp+018c<-kssxdl+0350<-ksuxdl+02a0 <-ksuxda+028c<-ksucln+085c<-ksbrdp+04ec<-opirip+041c<-opidrv+0478<-sou2o+0090<-opimai_real+0150<-main+0098<-__start+0070 SQL> @s 4405 SID SQLID_AND_CHILD STATUS STATE EVENT SEQ# SEC_IN_WAIT BLOCKING_SID P1 P2 P3 P1TRANSL ------- -------------------- -------- ------- ---------------------------------------- ---------- ----------- ------------ ------------------ ------------------ ------------------ ------------------------------------------ 4405 0 ACTIVE WAITING latch: ges resource hash list 33650 0 UNKNOWN address= number= 66 tries= 39 0x700000AD1A9EE98: ges resource hash list[ # awr Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- --------- Begin Snap: 50106 05-Jun-15 15:30:42 837 16.7 End Snap: 50107 05-Jun-15 16:00:31 838 16.8 Elapsed: 29.82 (mins) DB Time: 691.71 (mins) Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- latch: ges resource hash list 166,505 25,783 155 62.1 Other db file sequential read 1,831,050 10,326 6 24.9 User I/O CPU time 4,056 9.8 gc current block 2-way 1,173,399 904 1 2.2 Cluster gc cr grant 2-way 847,244 468 1 1.1 Cluster -------------------------------------------------------------
NOTE:
从上面的信息可以发现当前的pmon 进程在等待中,从call stack 可以确认,wait event 是’latch: ges resource hash list’, 也是当前实例的top event, 猜测与当前的CTAS 的会话有关,查询了MOS与该事件相关存在一个bug
Bug 11690639 – High enqueue activity results in “latch: ges resource hash list” waits (文档 ID 11690639.8)
处理方法21万秒那个会话存在异常,kill了那个会话,该事件消失。
SQL> @a A-Script: Display active sessions... COUNT(*) SQL_ID STATE EVENT ---------- ------------- ------- ---------------------------------------------------------------- 5 4t9v2u4zdspxd WAITING db file sequential read 2 9z11d7hxfb1u6 WAITING db file sequential read 2 2mk5dt5xjg7zt WAITING db file sequential read 1 4t9v2u4zdspxd WAITING gc cr request 1 7gtuunjsd8cws WAITING db file sequential read SQL> @s 4405 SID SQLID_AND_CHILD STATUS STATE EVENT SEQ# SEC_IN_WAIT BLOCKING_SID P1 P2 P3 P1TRANSL ------- -------------------- -------- ------- ---------------------------------------- ---------- ----------- ------------ ------------------ ------------------ ------------------ ------------------------------------------ 4405 0 ACTIVE WORKING On CPU / runqueue 38981 19 UNKNOWN
NOTE:
可以看到pmon 进程当前的event 是’on cpu’ ,恢复了正常的状态, 下面检查监听,动态注册也恢复了正常。
[oracle@anbob2:/home/oracle]#/oracle/app/11.2.0.3/grid/bin/lsnrctl status LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 05-JUN-2015 17:21:10 Copyright (c) 1991, 2011, Oracle. All rights reserved. Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521)) STATUS of the LISTENER ------------------------ Alias LISTENER Version TNSLSNR for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production Start Date 05-JUN-2015 12:07:11 Uptime 0 days 5 hr. 13 min. 59 sec Trace Level off Security ON: Local OS Authentication SNMP ON Listener Parameter File /oracle/app/11.2.0.3/grid/network/admin/listener.ora Listener Log File /oracle/app/grid/diag/tnslsnr/anbob2/listener/alert/log.xml Listening Endpoints Summary... (DESCRIPTION=(ADDRESS=(PROTOCOL=ipc)(KEY=LISTENER))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.16)(PORT=1521))) (DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=133.96.60.116)(PORT=1521))) Services Summary... Service "anbobc.com" has 2 instance(s). Instance "anbobc2", status UNKNOWN, has 1 handler(s) for this service... Instance "anbobc2", status READY, has 1 handler(s) for this service... Service "anbobc_XPT.com" has 1 instance(s). Instance "anbobc2", status READY, has 1 handler(s) for this service... The command completed successfully [oracle@anbob2:/home/oracle]#/oracle/app/11.2.0.3/grid/bin/lsnrctl service LSNRCTL for IBM/AIX RISC System/6000: Version 11.2.0.3.0 - Production on 05-JUN-2015 17:21:29 Copyright (c) 1991, 2011, Oracle. All rights reserved. Connecting to (ADDRESS=(PROTOCOL=tcp)(HOST=)(PORT=1521)) Services Summary... Service "anbobc.com" has 2 instance(s). Instance "anbobc2", status UNKNOWN, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:1969 refused:0 LOCAL SERVER Instance "anbobc2", status READY, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:128 refused:0 state:ready LOCAL SERVER Service "anbobc_XPT.com" has 1 instance(s). Instance "anbobc2", status READY, has 1 handler(s) for this service... Handler(s): "DEDICATED" established:128 refused:0 state:ready LOCAL SERVER The command completed successfully
总结:
当前该bug还无法确认,不过可以确认的是CTAS 长期持有latch: ges resource hash list ,堵塞了pmon 对latch 的获取,影响了对listener的注册和service_update,listener 中的会话信息只增加不更新已释放,慢慢导致了最后的service 状态blocked, 致使会话通过监听时报ora-12519 and 12516.
对不起,这篇文章暂时关闭评论。