Troubleshooting ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT
最近有家银行客户一套核心凌晨跑批时报出了ORA-04036,与12c后增加的PGA_AGGREGATE_LIMIT有关,环境oracle RAC 12.1.0.2 on AIX, 临时增加了PGA_AGGREGATE_LIMIT参数大小解决,事后找我分析原因,简单记录一下分析思路。
DB ALERT LOG
TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 12.1.0.2.0 - Production
Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 12.1.0.2.0 - Production
Sun May 14 00:46:50 2023
Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob2/trace/anbob2_ora_42795290.trc (incident=1262543):
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT <<<<<<<<<<<<<<<<<<<<<<
Fatal NI connect error 12537, connecting to:
(LOCAL=NO)
Time: 14-MAY-2023 00:46:56
Sun May 14 00:46:52 2023
Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob2/trace/anbob2_ora_30409770.trc (incident=1262535):
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT
Sun May 14 00:46:55 2023
Errors in file /oracle/app/oracle/diag/rdbms/anbob/anbob2/trace/anbob2_ora_35389838.trc (incident=1260031):
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT
Sun May 14 00:47:37 2023
PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT of 26368 MB <<<<<<<<<<<<<<<<<<
VERSION INFORMATION:
TNS for IBM/AIX RISC System/6000: Version 12.1.0.2.0 - Production
TCP/IP NT Protocol Adapter for IBM/AIX RISC System/6000: Version 12.1.0.2.0 - Production
Oracle Bequeath NT Protocol Adapter for IBM/AIX RISC System/6000: Version 12.1.0.2.0 - Production
Thread 2 advanced to log sequence 155965 (LGWR switch)
Current log# 6 seq# 155965 mem# 0: +DATA/anbob/redo61.log
Current log# 6 seq# 155965 mem# 1: +DATA/anbob/redo62.log
Sun May 14 00:55:57 2023
Archived Log entry 512603 added for thread 2 sequence 155964 ID 0x35ebc53e dest 1:
Sun May 14 01:02:30 2023
ALTER SYSTEM SET pga_aggregate_limit=30G SCOPE=BOTH SID='*'; <<<<<<<<<<<<<<<<<<
Sun May 14 01:25:52 2023
数据库参数
memory_target 0
pga_aggregate_target 13G
PGA_AGGREGATE_LIMIT 25G
ora-4036
[oracle@oel7db1 ~]$ oerr ora 4036 04036, 00000, "PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT" // *Cause: Private memory across the instance exceeded the limit specified // in the PGA_AGGREGATE_LIMIT initialization parameter. The largest // sessions using Program Global Area (PGA) memory were interrupted // to get under the limit. // *Action: Increase the PGA_AGGREGATE_LIMIT initialization parameter or reduce // memory usage.
PGA_AGGREGATE_LIMIT
大多数数据库都在使用自动 PGA 内存管理,一旦 PGA_AGGREGATE_TARGET 参数设置为非零值,就会启用该管理。一个常见的误解是,此设置实际上限制了实例的整体 PGA 使用。事实是,自动PGA内存管理将尝试遵守PGA_AGGREGATE_TARGET值。这些是 Oracle 官方文档中的实际用语:“尝试遵守”!这意味着排序内存、散列内存和位图内存将通过自动 PGA 内存管理主动限制每个进程的大小,任何分配超过自动 PGA 内存管理允许的尝试都将导致将这些内存区域的某些内容移动到数据库用户的分配临时表空间,以便为新数据腾出空间。但是,每个进程有更多的内存区域可分配,这些内存区域永远不会交换到磁盘,因此将始终保留在内存中,并且在 Oracle 版本 12 之前无法以官方支持的方式限制这些内存区域。在 PGA 中分配且从未交换到磁盘的两个结构是 PL/SQL 集合和 PL/SQL 表,创建和填充这些需要使用PL / SQL,这意味着开发人员可以创建这样的结构,并分配他/她喜欢的任何内容,以及它可能产生的所有后果,Oracle 在 Oracle 12 中引入了一个名为 PGA_AGGREGATE_LIMIT 的参数,以有效地限制整体 PGA 堆大小。
关于PGA_AGGREGATE_LIMIT参数在我之前的blog有记录《Know more about PGA_AGGREGATE_LIMIT 12c 19c》
Frits Hoogland曾分享过测试突破PGA_AGGREGATE_TARGET的方法
exec dbms_random.seed('abracadabra'); create table t2 as with generator as ( select rownum id from dual connect by rownum <= 1000 ) select rownum id, trunc((rownum-1)/50) clustered, mod(rownum,20000) scattered, trunc(dbms_random.value(0,20000)) randomized, trunc(sysdate) + dbms_random.value(-180, 180) random_date, dbms_random.string('l',6) random_string, lpad(rownum,10,0) vc_small, rpad('x',100,'x') vc_padding from generator g1, generator g2 where rownum <= 1000000 ; exec dbms_stats.gather_table_stats(null,'T2'); declare type sourcetab is table of t2%ROWTYPE; c_tmp sourcetab; c_def sourcetab := sourcetab(); v_b_p number := 0; v_c_p number := 0; v_b_u number := 0; v_c_u number := 0; grow_until number := 700000000; p_a_t number; begin select value into v_b_p from v$mystat m, v$statname n where m.statistic#=n.statistic# and name = 'session pga memory max'; select value into v_b_u from v$mystat m, v$statname n where m.statistic#=n.statistic# and name = 'session uga memory max'; select value into p_a_t from v$parameter where name = 'pga_aggregate_target'; select * bulk collect into c_tmp from t2; while v_c_p < grow_until loop for c in c_tmp.first .. c_tmp.last loop c_def.extend(1); c_def(c_def.last) := c_tmp(c); select value into v_c_p from v$mystat m, v$statname n where m.statistic#=n.statistic# and name = 'session pga memory max'; select value into v_c_u from v$mystat m, v$statname n where m.statistic#=n.statistic# and name = 'session uga memory max'; if v_c_p >= grow_until then exit; end if; end loop; end loop; dbms_output.put_line('vbp : '||v_b_p); dbms_output.put_line('vcp : '||v_c_p); dbms_output.put_line('vbu : '||v_b_u); dbms_output.put_line('vcu : '||v_c_u); dbms_output.put_line('pat : '||p_a_t); end; /
配置pga_aggregate_target一个较小的值上面可以测试超过pga_aggregate_target, 12c后引入pga_aggregate_limit限制比如尝试分配 900MB。这意味着“grow_until”变量必须设置为 900000000。
请注意,这是作为普通用户完成的,SYS 用户和作业队列进程以外的后台进程不受限制。
当超过pga_aggregate_limit时会出现ora-4036. 请注意,PGA_AGGREGATE_LIMIT似乎真正限制了 PGA 使用实例的范围,而不是像事件 10261(Oracle 11.2 的 PGA 使用限制方式)那样限制每个进程的 PGA 堆。
alter system set event = '10261 trace name context forever, level 600000' scope=spfile; System altered.
ora-4036 trace
.\anbob2\tfa\anbobdb02\diag\rdbms\anbob\anbob2\incident *** 2023-05-14 00:47:45.018 *** SESSION ID:(3526.21157) 2023-05-14 00:47:45.018 *** CLIENT ID:() 2023-05-14 00:47:45.018 *** SERVICE NAME:(anbob) 2023-05-14 00:47:45.018 *** MODULE NAME:(JDBC Thin Client) 2023-05-14 00:47:45.018 *** CLIENT DRIVER:(jdbcthin) 2023-05-14 00:47:45.018 *** ACTION NAME:() 2023-05-14 00:47:45.018 [TOC00000] Jump to table of contents Dump continued from file: /oracle/app/oracle/diag/rdbms/anbob/anbob2/trace/anbob2_ora_30409770.trc [TOC00001] ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT [TOC00001-END] *** 2023-05-14 00:47:52.459 [TOC00002] ========= Dump for incident 1262535 (ORA 4036) ======== *** 2023-05-14 00:47:52.459 dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0) [TOC00003] ----- Current SQL Statement for this session (sql_id=2cpwnwjvuzxx8) ----- SELECT * FROM kdpl_***mx WHERE jiluztai = '0' and ****GHAO = :1 and ****LEIX = :2 and ****XIRQ = :3 and ( TZSHYNBZ = '1' or ( LIX***M = 'ADJUSTMENT' and TZSHYNBZ = '2' ) ) and farendma = :4 PRIVATE MEMORY SUMMARY FOR THIS PROCESS --------------------------------------- ****************************************************** PRIVATE HEAP SUMMARY DUMP 1610 MB total: 1606 MB commented, 680 KB permanent 3834 KB free (0 KB in empty extents), 1604 MB, 2 heaps: "callheap " *** 2023-05-14 00:55:38.255 ------------------------------------------------------ Summary of subheaps at depth 1 1580 MB total: 1409 MB commented, 171 MB permanent 138 KB free (0 KB in empty extents), 1388 MB, 325263 heaps: "qmu subheap " <<< ------------------------------------------------------ Summary of subheaps at depth 2 1362 MB total: 70 KB commented, 378 MB permanent 983 MB free (650 MB in empty extents), *** 2023-05-14 00:55:39.513 sending 4036 interrupt DDE: Problem Key 'ORA 4036' was flood controlled (0x6) (incident: 1262383) ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ... dbgePostErrorKGE()+ call dbgeExecuteForError 000000000 ? 000000000 ? dbkePostKGE_kgsf()+ call dbgePostErrorKGE() 000020000 ? 121657688 ? 64 FC4FFFF2170 ? 1218C4BF8 ? kgeade()+408 call dbkePostKGE_kgsf() 00001BF00 ? 700000000004940 ? kgeselv()+96 call kgeade() FFFFFFFFFFF22E0 ? ksesecl0()+84 call kgeselv() 1049672B4 ? 700000000004940 ? ksm_4036_interrupt( call ksesecl0() 111A58C20 ? 00001BEDC ? )+80 FC400000FC4 ? FFFFFFFFFFFFFFFF ? ksm_check_over_limit call ksm_4036_interrupt( BE000000BE ? 100000001 ? () ksmarfg()+248 call ksm_check_over_limi 000000000 ? FFFFFFFFFFF2600 ? t() 00000000C ? 000010000 ? kghgex()+1464 call ksmarfg() FFFFFFFFFFF6A90 ? 700010FC1DADD08 ? kghfnd()+1092 call kghgex() FFFFFFFFFFF2500 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kghalo()+2996 call kghfnd() 121263568 ? 121263460 ? 121263568 ? 121657688 ? 121657688 ? 000001058 ? 000000000 ? 000000000 ? kghgex()+928 call kghalo() FFFFFFFFFFF2A70 ? 700000000014A30 ? FFFFFFFFFFF2AA0 ? 44224248000001E0 ? FFFFFFFFFFF2AB0 ? 15C26BF78 ? 121646978 ? 000010000 ? kghfnd()+1092 call kghgex() 000000000 ? 000000000 ? 121644E68 ? 121657688 ? FFFFFFFFFFF2C30 ? 4222402021657688 ? 104131E84 ? 000000FF8 ? kghalo()+2996 call kghfnd() 121644EB4 ? 000000240 ? 000000000 ? 121657688 ? 112CF40A0 ? 121644E68 ? FFFFFFFFFFF2D20 ? 4222402000000000 ? kghgex()+928 call kghalo() 000000000 ? 000000000 ? 000000030 ? 000000240 ? 000000FE8 ? 000000000 ? 15C26C54C ? 121657764 ? kghfnd()+1092 call kghgex() 000000000 ? 000000000 ? 15C26C500 ? 121657688 ? FFFFFFFFFFF3010 ? 4222202000000001 ? 104131E84 ? 000000FB8 ? kghalo()+2996 call kghfnd() 15C26C54C ? 0000000A0 ? 000000000 ? 121657688 ? kghgex()+928 call kghalo() 700010E5EC86380 ? 000000000 ? kghalf()+1496 call kghgex() 000000001 ? 000000001 ? kxsSegAllocate()+16 call kghalf() 000000000 ? 15C26C500 ? kxsFrameSegAllocate call kxsSegAllocate() 000000005 ? 000000000 ? ()+92 000000001 ? 0000000F1 ? kxsEphemeralFramePush call kxsFrameSegAllocate 10AD30DF8 ? 1120155AC ? ()+196 () FFFFFFFFFFF35E0 ? 1215C34B8 ? ABC00000ABC ? 100000001 ? kkeutlEvalOpn()+184 call kxsEphemeralFramePu 000000002 ? 000000000 ? sh() FFFFFFFFFFF36B0 ? 000000141 ? 35A4252FFFFF3CC8 ? kkeutlCompareConstants call kkeutlEvalOpn() 000000000 ? 000000058 ? ()+552 FFFFFFFFFFF3A60 ? 000000000 ? 15C274088 ? 700010E5EC86380 ? 000008000 ? 000000070 ? kkeutlCompHistActVals call kkeutlCompareConsta 121646938 ? 121657688 ? ()+128 nts() FFFFFFFFFFF3A60 ? 0645FBEF4 ? 59CB9FFFF38A0 ? 000000000 ? 000000007 ? 1215C34B8 ? kkehbs()+1212 call kkeutlCompHistActVa FFFFFFFFFFF3930 ? ls() 700010FD934CD38 ? 000000000 ? kkeh2sl_Int()+344 call kkehbs() 000000004 ? 000000001 ? FFFFFFFFFFF4950 ? 121A12548 ? 000000000 ? 000000001 ? 000000004 ? 000000001 ? kkehsl_Int()+4016 call kkeh2sl_Int() 10A2FDEF0 ? 000000003 ? 000000001 ? 000000008 ? kkehslCursorMatch() call kkehsl_Int() FFFFFFFFFFF3D10 ? 000002000 ? +88 1049D1324 ? 0000001B8 ? kkocsPredSel()+744 call kkehslCursorMatch() 10547A464 ? 1124FE518 ? FFFFFFFFFFF3E60 ? 000000000 ? 121644EB4 ? 000000000 ? 15C273F70 ? 700010FC994E3A8 ? kkscsVerifyBindEquivRP call kkocsPredSel() 000000000 ? 000000038 ? ()+164 000000010 ? 3FEFD417E514BD2E ? 121931B30 ? 000000010 ? 000000248 ? 000000000 ? kkscsVerifyBindEquivalence call kkscsVerifyBindEqui 020000000 ? 000000010 ? ()+92 vRP() FFFFFFFFFFF4040 ? 1000000000001 ? FFFFFFFFFFF4050 ? 121931B30 ? 000000002 ? 700010ED6572B78 ? kkscsMatchEquivLiterals call kkscsVerifyBindEqui FFFFFFFFFFF40E0 ? ()+484 valence() 700010E7C0FBAA8 ? 700010ED3CC6CC8 ? 000000003 ? 010008110 ? 700010E7C0FBAA8 ? FFFFFFFFFFF40E0 ? 100000003 ? kkscsCompareBinds() call kkscsMatchEquivLite 000000000 ? 000000000 ? +1796 rals() 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? 000000000 ? kkscscid_bnd_eval() call kkscsCompareBinds() 000000000 ? 000000000 ? +348 10200000102 ? 100000001 ? FFFFFFFFFFF51F0 ? 000000000 ? 1124FE924 ? 1124FE714 ? kkscsCheckCriteria( call kkscscid_bnd_eval() 121657688 ? 121A125C0 ? )+1624 FFFFFFFFFFF4470 ? FFFFFFFFFFF4DE0 ? FFFFFFFFFFF4470 ? 000000005 ? 700010E6F2DBB88 ? 3215C34B8 ? kkscsCheckCursor()+ call kkscsCheckCriteria( 121657688 ? 121A125C0 ? 696 ) 121A125C0 ? 000000000 ? 500000000000000 ? 000000000 ? 000000000 ? 000000000 ? kkscsSearchChildList() call kkscsCheckCursor() 121657688 ? FFFFFFFFFFF4DE0 ? +904 121A125C0 ? 102FFFF4B30 ? FFFFFFFFFFF49C0 ? FFFFFFFFFFF4D18 ? 700010ED3CC7600 ? FFFFFFFFFFF4960 ? kksfbc()+21028 call kkscsSearchChildLis FFFFFFFFFFF5240 ? 121A125C0 ? t() 121A125C0 ? 000000000 ? FFFFFFFFFFF4C30 ? 42822028FFFF52C8 ? 104F07BD0 ? 000000000 ? opiexe()+2196 call kksfbc() 121A125C0 ? 300000000 ? 102FFFF5CB0 ? 000000000 ? 000000000 ? FFFFFFFFFFF6258 ?
报错的进程使用了1.5g heap, 此进程中较多的是”qmu subheap ” qm是与XDB相关的,以”xml”为关键字查找trace
行 13022: sword prmsmon_bindsxmlfmt_ [1215C6AA8, 1215C6AAC) = 00000C7E 行 53595: koidf qmxmlschoid_ [700010FE066D418, 700010FE066D428) = 00000000 00000000 ...
note:
排除了表列有与xml相关的类型,判断应该是与此SQL解析相关,关键字像是绑定变量的xml。call stack中提到了sql cursor相关的。
与XML 和ORA-4036相关的 known issues
For Oracle Versions BELOW 23.1
Apply patch:32769880. See NOTE:32769880.8 xmlelement leads to high consumption of pga leads to ORA-4036 or ORA-4030
Apply patch:33319065. See NOTE:2849174.1 ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT while testing with XML structuresFor Oracle Versions BELOW 12.2
Apply patch:17817258. See NOTE:17817258.8 Large PGA/UGA utilization for XML queries (ORA-4030 / ORA-4036)
检查SQL游标不共享的的VIEW reason
SQL> @desc gv$SQL_SHARED_CURSOR
Name Null? Type
------------------------------- -------- ----------------------------
1 INST_ID NUMBER
2 SQL_ID VARCHAR2(13)
3 ADDRESS RAW(8)
4 CHILD_ADDRESS RAW(8)
5 CHILD_NUMBER NUMBER
6 UNBOUND_CURSOR VARCHAR2(1)
7 SQL_TYPE_MISMATCH VARCHAR2(1)
8 OPTIMIZER_MISMATCH VARCHAR2(1)
9 OUTLINE_MISMATCH VARCHAR2(1)
10 STATS_ROW_MISMATCH VARCHAR2(1)
11 LITERAL_MISMATCH VARCHAR2(1)
12 FORCE_HARD_PARSE VARCHAR2(1)
13 EXPLAIN_PLAN_CURSOR VARCHAR2(1)
14 BUFFERED_DML_MISMATCH VARCHAR2(1)
15 PDML_ENV_MISMATCH VARCHAR2(1)
16 INST_DRTLD_MISMATCH VARCHAR2(1)
17 SLAVE_QC_MISMATCH VARCHAR2(1)
18 TYPECHECK_MISMATCH VARCHAR2(1)
19 AUTH_CHECK_MISMATCH VARCHAR2(1)
20 BIND_MISMATCH VARCHAR2(1)
21 DESCRIBE_MISMATCH VARCHAR2(1)
22 LANGUAGE_MISMATCH VARCHAR2(1)
23 TRANSLATION_MISMATCH VARCHAR2(1)
24 BIND_EQUIV_FAILURE VARCHAR2(1)
25 INSUFF_PRIVS VARCHAR2(1)
26 INSUFF_PRIVS_REM VARCHAR2(1)
27 REMOTE_TRANS_MISMATCH VARCHAR2(1)
28 LOGMINER_SESSION_MISMATCH VARCHAR2(1)
29 INCOMP_LTRL_MISMATCH VARCHAR2(1)
30 OVERLAP_TIME_MISMATCH VARCHAR2(1)
31 EDITION_MISMATCH VARCHAR2(1)
32 MV_QUERY_GEN_MISMATCH VARCHAR2(1)
33 USER_BIND_PEEK_MISMATCH VARCHAR2(1)
34 TYPCHK_DEP_MISMATCH VARCHAR2(1)
35 NO_TRIGGER_MISMATCH VARCHAR2(1)
36 FLASHBACK_CURSOR VARCHAR2(1)
37 ANYDATA_TRANSFORMATION VARCHAR2(1)
38 PDDL_ENV_MISMATCH VARCHAR2(1)
39 TOP_LEVEL_RPI_CURSOR VARCHAR2(1)
40 DIFFERENT_LONG_LENGTH VARCHAR2(1)
41 LOGICAL_STANDBY_APPLY VARCHAR2(1)
42 DIFF_CALL_DURN VARCHAR2(1)
43 BIND_UACS_DIFF VARCHAR2(1)
44 PLSQL_CMP_SWITCHS_DIFF VARCHAR2(1)
45 CURSOR_PARTS_MISMATCH VARCHAR2(1)
46 STB_OBJECT_MISMATCH VARCHAR2(1)
47 CROSSEDITION_TRIGGER_MISMATCH VARCHAR2(1)
48 PQ_SLAVE_MISMATCH VARCHAR2(1)
49 TOP_LEVEL_DDL_MISMATCH VARCHAR2(1)
50 MULTI_PX_MISMATCH VARCHAR2(1)
51 BIND_PEEKED_PQ_MISMATCH VARCHAR2(1)
52 MV_REWRITE_MISMATCH VARCHAR2(1)
53 ROLL_INVALID_MISMATCH VARCHAR2(1)
54 OPTIMIZER_MODE_MISMATCH VARCHAR2(1)
55 PX_MISMATCH VARCHAR2(1)
56 MV_STALEOBJ_MISMATCH VARCHAR2(1)
57 FLASHBACK_TABLE_MISMATCH VARCHAR2(1)
58 LITREP_COMP_MISMATCH VARCHAR2(1)
59 PLSQL_DEBUG VARCHAR2(1)
60 LOAD_OPTIMIZER_STATS VARCHAR2(1)
61 ACL_MISMATCH VARCHAR2(1)
62 FLASHBACK_ARCHIVE_MISMATCH VARCHAR2(1)
63 LOCK_USER_SCHEMA_FAILED VARCHAR2(1)
64 REMOTE_MAPPING_MISMATCH VARCHAR2(1)
65 LOAD_RUNTIME_HEAP_FAILED VARCHAR2(1)
66 HASH_MATCH_FAILED VARCHAR2(1)
67 PURGED_CURSOR VARCHAR2(1)
68 BIND_LENGTH_UPGRADEABLE VARCHAR2(1)
69 USE_FEEDBACK_STATS VARCHAR2(1)
70 REASON CLOB
71 CON_ID NUMBER
SQL> select reason from gv$SQL_SHARED_CURSOR where sql_id='&sqlid';
Enter value for sqlid: 9g485acn2n30m
REASON
--------------------------------------------------------------------------------
<ChildNode><ChildNumber>3</ChildNumber><ID>3</ID><reason>Optimizer mismatch(12)<
/reason><size>2x476</size><_fix_control_key> -866096202 0
</_fix_control_key></ChildNode>
<ChildNode><ChildNumber>4</ChildNumber><ID>3</ID><reason>Optimizer mismatch(12)<
/reason><size>2x476</size><_optim_peek_user_binds> true false
</_optim_peek_user_binds><_fix_control_key> 0 -86
6096202 </_fix_control_key><optimizer_adaptive_plans> true
false </optimizer_adaptive_plans></ChildNode>
Note:
注意从gv$SQL_SHARED_CURSOR的REASON列可以判断sql cursor未共享的原因的元数据组织形式是XML格式, 存在一种可能是因为SQL 的子游标过多,该数据记录过大,从trace中查找子游标个数和未共享原因:
---------------------------------------- SO: 0x700010eddfce738, type: 96, owner: 0x700010fc994d420, flag: INIT/-/-/0x00 if: 0x3 c: 0x3 proc=0x700010fd934c7f0, name=LIBRARY OBJECT LOCK, file=kgl.h LINE:9221 ID:, pg=0 conuid=0 LibraryObjectLock: Address=700010eddfce738 Handle=700010e5ae75478 Mode=N CanBeBrokenCount=1 Incarnation=1 ExecutionCount=0 User=700010fc994d420 Session=700010fc994d420 ReferenceCount=1 Flags=CNB/[0001] SavepointNum=645fbe93 LibraryHandle: Address=700010e5ae75478 Hash=77aff7a8 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD ObjectName: Name= SELECT * FROM kdpl_***mx WHERE jiluztai = '0' and ****GHAO = :1 and ****LEIX = :2 and ****XIRQ = :3 and ( TZSHYNBZ = '1' or ( LIX***M = 'ADJUSTMENT' and TZSHYNBZ = '2' ) ) and farendma = :4 FullHashValue=463c6c019d50635b265794e477aff7a8 Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=0 ContainerUid=0 Identifier=2008020904 OwnerIdn=106 Statistics: InvalidationCount=0 ExecutionCount=508901 LoadCount=2630 ActiveLocks=85 TotalLockCount=34042 TotalPinCount=15 Counters: BrokenCount=1 RevocablePointer=1 KeepDependency=541 Version=0 BucketInUse=34041 HandleInUse=34041 HandleReferenceCount=0 Concurrency: DependencyMutex=700010e5ae75528(0, 4940, 0, 0) Mutex=700010e5ae755c0(0, 16171485, 565761, 0) Flags=RON/PIN/TIM/PN0/DBN/[10012841] Flags2=[0000] WaitersLists: Lock=700010e5ae75508[700010e5ae75508,700010e5ae75508] Pin=700010e5ae754e8[700010e5ae754e8,700010e5ae754e8] LoadLock=700010e5ae75560[700010e5ae75560,700010e5ae75560] Timestamp: Current=04-29-2023 00:19:44 HandleReference: Address=700010e5ae75718 Handle=0 Flags=[00] LibraryObject: Address=700010ed3cc6c00 HeapMask=0000-0001-0001-0000 Flags=EXS[0000] Flags2=[0000] PublicFlags=[0000] ChildTable: size='960' Child: id='0' Table=700010ed3cc7a80 Reference=700010ed3cc7520 Handle=700010ed579d680 Child: id='1' Table=700010ed3cc7a80 Reference=700010ed3cc78a8 Handle=700010e5e90fd08 Child: id='2' Table=700010ed3cc7a80 Reference=700010e5ec0ec10 Handle=700010ed6962300 Child: id='3' Table=700010ed3cc7a80 Reference=700010ed4d3d670 Handle=700010e5d361d90 Child: id='4' Table=700010ed3cc7a80 Reference=700010ed4d3d978 Handle=700010ed6977838 ... Child: id='957' Table=700010e58b50108 Reference=700010e5e02a560 Handle=700010ed7378be8 NamespaceDump: [TOC00022] Parent Cursor: sql_id=2cpwnwjvuzxx8 parent=700010ed3cc6cc8 maxchild=958 plk=y ppn=n [TOC00023] CursorDiagnosticsNodes: ChildNode: ChildNumber=539 ID=40 reason=Bind mismatch(33) size=1x4 init_ranges_in_first_pass=1062054815 ChildNode: ChildNumber=538 ID=40 reason=Bind mismatch(33) size=1x4 init_ranges_in_first_pass=1062054815 ChildNode: ChildNumber=537 ID=40 reason=Bind mismatch(33) size=1x4 init_ranges_in_first_pass=1062054815
note:
当时运行的SQL有900多个子游标.
# AWR
Top 10 Foreground Events by Total Wait Time
Event | Waits | Total Wait Time (sec) | Wait Avg(ms) | % DB time | Wait Class |
---|---|---|---|---|---|
cursor: mutex X | 4,155,981 | 3762.2 | 0.91 | 7.9 | Concurrency |
DB CPU | 3735.3 | 7.9 | |||
cursor: pin S wait on X | 3,204 | 2468.2 | 770.35 | 5.2 | Concurrency |
log file sync | 138,319 | 2085.7 | 15.08 | 4.4 | Commit |
cursor: mutex S | 1,176,766 | 1732.5 | 1.47 | 3.6 | Concurrency |
gc current grant busy | 228,543 | 1203.8 | 5.27 | 2.5 | Cluster |
Statistic | Begin Value | End Value |
---|---|---|
logons current | 688 | 724 |
opened cursors current | 815 | 788 |
session cursor cache count | 37,925,817 | 38,235,110 |
session pga memory | 2,048,921,336 | 2,039,449,816 |
session pga memory max | 3,487,008,280 | 121,350,925,720 |
session uga memory | 249,334,744,712 | 250,180,204,264 |
session uga memory max | 1,995,725,260,856 | 2,006,723,832,288 |
问题前后的会话数并无明显增加。数据库负载上出现了sql cursor争用。分析了所有ora-4036 trace都为2cpwnwjvuzxx8 SQLid. 执行计划相同。
SQL
SQL_ID 2cpwnwjvuzxx8, child number 0 ------------------------------------- SELECT * FROM kdpl_***mx WHERE jiluztai = '0' and ****GHAO = :1 and ****LEIX = :2 and ****XIRQ = :3 and ( TZSHYNBZ = '1' or ( LIX***M = 'ADJUSTMENT' and TZSHYNBZ = '2' ) ) and farendma = :4 Plan hash value: 994448910 ------------------------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | | | 6 (100)| | |* 1 | TABLE ACCESS BY INDEX ROWID| KDPL_****MX | 1 | 297 | 6 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | KDPL_******_IDX1 | 1 | | 5 (0)| 00:00:01 | ------------------------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(("SC***XIRQ"=:3 AND ("TZS**NBZ"='1' OR ("TZSHYNBZ"='2' AND "LIX**ZDM"='ADJUSTMENT')) AND "JILUZTAI"='0')) 2 - access("ZH**HAO"=:1 AND "LI**LEIX"=:2 AND "FARENDMA"=:4) filter(("LI**LEIX"=:2 AND "FARENDMA"=:4)) Note ----- - SQL profile SYS_SQLPROF_02788792a0cd003f used for this statement **************************************************************************************** OBJECT SIZE **************************************************************************************** SEGMENT OWNER NAME SEGMENT_TYPE S_SIZE --------------- ----------------------------------- ------------------ ---------- anbob KDPL_***XMX TABLE 12939 KDPL_******_IDX1 INDEX 3880 **************************************************************************************** TABLES **************************************************************************************** TABLE LOG BUFFER NUM EMPTY LAST OWNER NAME TEMP POOL D PART ROWS BLOCKS BLOCKS BLOCK_SIZE(M) AVG_SIZE(M) ANALYZED --------------- ----------------------------------- ------- ------- - ---- ------------ ------------ ------------ ------------- ----------- -------- anbob KDPL_***XMX YES.N DEFAULT 1 NO 36,813,239 1,654,071 0 12,922 11,304 20230516 TABLE TABLE Index COLUMN Col OWNER NAME Name UCPTDVS NAME Pos DESC --------------- ----------------------------------- ------------------------------ ------- ------------------------- ---- ---- anbob KDPL_***XMX KDPL_******_IDX1 UNNNNVY ZHANGHAO 1 ASC UNNNNVY XUHAOOOO 2 ASC UNNNNVY LIXIZUHO 3 ASC UNNNNVY LIXILEIX 4 ASC UNNNNVY FARENDMA 5 ASC INDEX STATS **************************************************************************************** Average Average TABLE TABLE Index LOCAL Leaf Distinct Leaf Blocks Data Blocks Cluster LAST OWNER NAME Name LOG PRE B Blks Keys NUM_ROWS Per Key Per Key Factor D ANALYZED --------------- ----------------------------------- ------------------------------ --- ------ - -------- ------------ -------- ----------- ----------- ------------ - -------- anbob KDPL_***XMX KDPL_******_IDX1 YES | 3 478,081 36,182,285 ######## 1 1 36,122,310 1 20230516
NOTE:
索引的选择率很好。
诊断脚本
如果运行时发现进程使用内存高 # PGA使用率 select round(pga_tar / 1024 / 1024, 1) pga_soft_limit, round(pga_limit / 1024 / 1024, 1) pga_hard_limit, round(a.pga_alloc, 1) pga_allocated, round(a.pga_used, 1) pga_used, round(a.pga_alloc / (pga_tar / 1024 / 1024), 1) * 100 || '%' pga_soft_pct, round(a.pga_alloc / (pga_limit / 1024 / 1024), 1) * 100 || '%' pga_hard_pct, max_pga_proc from (select sum(PGA_ALLOC_MEM) / 1024 / 1024 pga_alloc ,sum(pga_used_mem)/(1024*1024) pga_used,round(max(PGA_ALLOC_MEM) / 1024 / 1024,2) max_pga_proc from v$process) a, (select value pga_tar from v$parameter where name='pga_aggregate_target')b, (select value pga_limit from v$parameter where name='pga_aggregate_limit')c # top进程 col LOGON for a10 col username for a20 col machine for a20 col program for a25 trunc col module for a20 col OSUSER for a10 col SPID for a9 SELECT DECODE(TRUNC(SYSDATE - LOGON_TIME), 0, NULL, TRUNC(SYSDATE - LOGON_TIME) || ' Days' || ' + ') ||TO_CHAR(TO_DATE(TRUNC(MOD(SYSDATE-LOGON_TIME,1) * 86400), 'SSSSS'), 'HH24:MI:SS') LOGON, SID, s.SERIAL#, p.SPID , ROUND(p.pga_used_mem/(1024*1024), 2) PGA_MB_USED, ROUND(p.PGA_ALLOC_MEM/(1024*1024), 2) PGA_ALLOCed, ROUND(pga_freeable_mem/1024/1024) PGA_FREA, ROUND(pga_max_mem/1024/1024) PGA_MAX, s.USERNAME, STATUS, OSUSER, MACHINE, s.PROGRAM, sql_id FROM v$session s, v$process p WHERE s.paddr = p.addr and PGA_ALLOC_MEM>3*(1024*1024) order by PGA_ALLOC_MEM desc; select * from ( SELECT inst_id, sid , sql_id , operation_type , operation_id plan_line , ROUND(active_time/1000000,1) active_sec , actual_mem_used/1024/1024 actual_mem_used , max_mem_used /1024/1024 max_mem_used , work_area_size/1024/1024 work_area_size , number_passes , tempseg_size , tablespace FROM gv$sql_workarea_active order by ACTUAL_MEM_USED desc )where rownum<=10; # 某session进程 SELECT pid, category, allocated, used, max_allocated FROM v$process_memory WHERE pid = (SELECT pid FROM v$process WHERE addr= (select paddr FROM v$session WHERE sid = NNN)); 或 SELECT category, name, heap_name, bytes, allocation_count, heap_descriptor, parent_heap_descriptor FROM v$process_memory_detail WHERE pid = XXX --AND category = 'Other' ; 或dump alter session set events'immediate trace name PGA_DETAIL_GET level <PID>'; ORADEBUG SETMYPID; ORADEBUG DUMP PGA_DETAIL_GET <PID>;
事后分析
从DASH查看当时的情况 select to_char(sample_time,'yyyymmdd hh24:mi:ss') etime,INSTANCE_NUMBER,session_id,session_serial#,sql_id,TOP_LEVEL_SQL_ID,SQL_EXEC_START,SQL_EXEC_ID,program,PGA_ALLOCATED from TAB_ASH_TEMP_20230514 where (INSTANCE_NUMBER,session_id,SESSION_SERIAL#) in( select distinct INSTANCE_NUMBER,session_id,SESSION_SERIAL# from (select to_char(sample_time,'yyyymmdd hh24:mi:ss') etime,INSTANCE_NUMBER,session_id,SESSION_SERIAL#,program,PGA_ALLOCATED from TAB_ASH_TEMP_20230514 order by PGA_ALLOCATED desc nulls last) t where rownum<=20) order by 3,2,1; ETIME INSTANCE_NUMBER SESSION_ID SESSION_SERIAL# SQL_ID TOP_LEVEL_SQL SQL_EXEC_START SQL_EXEC_ID PROGRAM PGA_ALLOCATED ----------------- --------------- ---------- --------------- ------------- ------------- ------------------- ----------- -------------------- ---------------- 20230514 00:44:44 2 1056 45684 gr9nbg3ptgzn4 gr9nbg3ptgzn4 2023-05-14 00:44:42 33556278 JDBC Thin Client 5,607,424 20230514 00:45:14 2 1056 45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 124,537,856 20230514 00:46:36 2 1056 45684 88bvj239xx2ah 88bvj239xx2ah 2023-05-14 00:45:18 34818232 JDBC Thin Client 5,196,800 20230514 00:46:47 2 1056 45684 b4tcskhvduq4m b4tcskhvduq4m 2023-05-14 00:45:18 34608683 JDBC Thin Client 5,196,800 20230514 00:47:00 2 1056 45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 368,397,312 20230514 00:47:18 2 1056 45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 452,283,392 20230514 00:47:33 2 1056 45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8 2023-05-14 00:47:23 34063265 JDBC Thin Client 506,874,880 20230514 00:47:45 2 1056 45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 20,007,936 20230514 00:47:57 2 1056 45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 234,179,584 20230514 00:48:07 2 1056 45684 66c902x0hurcj 66c902x0hurcj 2023-05-14 00:48:06 36636084 JDBC Thin Client 5,262,336 20230514 00:48:17 2 1056 45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 7,031,808 20230514 00:48:27 2 1056 45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 246,893,568 20230514 00:48:37 2 1056 45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 569,527,296 20230514 00:48:47 2 1056 45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 708,267,008 20230514 00:48:57 2 1056 45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 839,863,296 20230514 00:49:07 2 1056 45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 1,010,650,112 20230514 00:49:17 2 1056 45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 1,210,665,984 20230514 00:49:27 2 1056 45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 1,461,472,256 20230514 00:49:37 2 1056 45684 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 1,722,371,072 20230514 00:49:47 2 1056 45684 JDBC Thin Client 2,400,275,456 20230514 00:45:14 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 171,479,040 20230514 00:46:36 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 2023-05-14 00:45:18 34063041 JDBC Thin Client 284,856,320 20230514 00:46:47 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 27,627,520 20230514 00:47:00 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 150,310,912 20230514 00:47:18 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 267,751,424 20230514 00:47:33 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 2023-05-14 00:47:27 34063277 JDBC Thin Client 59,478,016 20230514 00:47:45 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 206,606,336 20230514 00:47:57 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 351,637,504 20230514 00:48:07 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 502,632,448 20230514 00:48:17 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 536,186,880 20230514 00:48:27 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 623,742,976 20230514 00:48:37 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 813,076,480 20230514 00:48:47 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 964,399,104 20230514 00:48:57 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 1,143,640,064 20230514 00:49:07 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 1,434,685,440 20230514 00:49:17 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 1,592,430,592 20230514 00:49:27 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 1,847,693,312 20230514 00:49:37 2 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 2,409,795,584 ... 20230514 00:49:07 2 5798 64663 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 1,272,073,216 20230514 00:49:17 2 5798 64663 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 1,449,348,096 20230514 00:49:27 2 5798 64663 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 1,714,506,752 20230514 00:49:37 2 5798 64663 2cpwnwjvuzxx8 2cpwnwjvuzxx8 JDBC Thin Client 2,339,589,120 182 rows selected.
Note:
显示均在并发执行2cpwnwjvuzxx8 SQL时报错,该SQL时单个会话占用pga近2.5G
其中一个会话
select to_char(sample_time,'yyyymmdd hh24:mi:ss') etime,session_id,session_serial#,sql_id,event,TOP_LEVEL_SQL_ID,IN_PARSE,IN_HARD_PARSE,IN_SQL_EXECUTION,IN_PLSQL_EXECUTION, SQL_EXEC_START,SQL_EXEC_ID,TOP_LEVEL_CALL_NAME,PGA_ALLOCATED --,TEMP_SPACE_ALLOCATED 3 from TAB_ASH_TEMP_20230514 where session_id=1535 order by 1; ETIME SESSION_ID SESSION_SERIAL# SQL_ID EVENT TOP_LEVEL_SQL I I I I SQL_EXEC_START SQL_EXEC_ID TOP_LEVEL_CALL_ PGA_ALLOCATED ----------------- ---------- --------------- ------------- ------------------------------ ------------- - - - - ------------------- ----------- --------------- -------------------- 20230514 00:25:28 1535 17319 5nk3zc32kfcjz gc current block 2-way 5nk3zc32kfcjz N N Y N 2023-05-14 00:25:28 36706951 V8 Bundled Exec 2,247,680 20230514 00:25:39 1535 17319 5nk3zc32kfcjz gc cr block 2-way 5nk3zc32kfcjz N N Y N 2023-05-14 00:25:38 36763597 V8 Bundled Exec 2,444,288 20230514 00:45:14 1535 22916 2cpwnwjvuzxx8 cursor: pin S wait on X 2cpwnwjvuzxx8 N N Y N V8 Bundled Exec 171,479,040 20230514 00:46:36 1535 22916 2cpwnwjvuzxx8 gc current grant busy 2cpwnwjvuzxx8 N N Y N 2023-05-14 00:45:18 34063041 V8 Bundled Exec 284,856,320 20230514 00:46:47 1535 22916 2cpwnwjvuzxx8 cursor: mutex S 2cpwnwjvuzxx8 N N Y N V8 Bundled Exec 27,627,520 20230514 00:47:00 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 N N Y N V8 Bundled Exec 150,310,912 20230514 00:47:18 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 N N Y N V8 Bundled Exec 267,751,424 20230514 00:47:33 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 N N Y N 2023-05-14 00:47:27 34063277 V8 Bundled Exec 59,478,016 20230514 00:47:45 1535 22916 2cpwnwjvuzxx8 cursor: mutex X 2cpwnwjvuzxx8 N N Y N V8 Bundled Exec 206,606,336 20230514 00:47:57 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 N N Y N V8 Bundled Exec 351,637,504 20230514 00:48:07 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 N N Y N V8 Bundled Exec 502,632,448 20230514 00:48:17 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 N N Y N V8 Bundled Exec 536,186,880 20230514 00:48:27 1535 22916 2cpwnwjvuzxx8 cursor: pin S wait on X 2cpwnwjvuzxx8 N N Y N V8 Bundled Exec 623,742,976 20230514 00:48:37 1535 22916 2cpwnwjvuzxx8 cursor: mutex X 2cpwnwjvuzxx8 N N Y N V8 Bundled Exec 813,076,480 20230514 00:48:47 1535 22916 2cpwnwjvuzxx8 cursor: mutex S 2cpwnwjvuzxx8 N N Y N V8 Bundled Exec 964,399,104 20230514 00:48:57 1535 22916 2cpwnwjvuzxx8 acknowledge over PGA limit 2cpwnwjvuzxx8 N N Y N V8 Bundled Exec 1,143,640,064 20230514 00:49:07 1535 22916 2cpwnwjvuzxx8 cursor: mutex X 2cpwnwjvuzxx8 N N Y N V8 Bundled Exec 1,434,685,440 20230514 00:49:17 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 N N Y N V8 Bundled Exec 1,592,430,592 20230514 00:49:27 1535 22916 2cpwnwjvuzxx8 2cpwnwjvuzxx8 N N Y N V8 Bundled Exec 1,847,693,312 20230514 00:49:37 1535 22916 2cpwnwjvuzxx8 cursor: mutex X 2cpwnwjvuzxx8 N N Y N V8 Bundled Exec 2,409,795,584 20 rows selected.
Note:
不是存储过程,sqlexec_start为空,IN_SQL_EXECUTION执行阶段
确认这SQL的PGA使用
1 select to_char(sample_time,'yyyymmdd hh24:mi:ss') etime,session_id,session_serial#,sql_id,event,TOP_LEVEL_SQL_ID,IN_PARSE,IN_HARD_PARSE,IN_SQL_EXECUTION,IN_PLSQL_EXECUTION, 2 SQL_EXEC_START,SQL_EXEC_ID,TOP_LEVEL_CALL_NAME,PGA_ALLOCATED --,TEMP_SPACE_ALLOCATED 3* from TAB_ASH_TEMP_20230514 where sql_id='2cpwnwjvuzxx8' order by 2,1
Note:
大量SESSION和上面1535 session一样,不再附输出.
PGA使用趋势
SQL> select to_char(sample_time,'yyyymmdd hh24:mi:ss') etime,INSTANCE_NUMBER,sum(PGA_ALLOCATED)/1024/1024 mb from TAB_ASH_TEMP_20230514 group by INSTANCE_NUMBER,to_char(sample_time,'yyyymmdd hh24:mi:ss') having sum(PGA_ALLOCATED)>1024*1024*3 order by 2,1 ;
问题时间PGA确实暴增,从ASH计算需求超过了30GB。
select to_char(sample_time,'yyyymmdd hh24:mi:ss') etime,INSTANCE_NUMBER,sum(PGA_ALLOCATED)/1024/1024 mb,count(*) AAS from TAB_ASH_TEMP_20230514 where instance_number=2 group by INSTANCE_NUMBER,to_char(sample_time,'yyyymmdd hh24:mi:ss') order by 2,1 ; ETIME INSTANCE_NUMBER MB AAS ----------------- --------------- ---------- ---------- 20230514 00:43:14 2 69.0185547 1 20230514 00:43:24 2 69.0185547 1 20230514 00:43:34 2 69.0185547 1 20230514 00:43:44 2 69.0185547 1 20230514 00:43:54 2 69.8935547 1 20230514 00:44:04 2 70.0185547 1 20230514 00:44:14 2 70.0185547 1 20230514 00:44:24 2 70.0185547 1 20230514 00:44:44 2 404.989258 70 20230514 00:44:54 2 118.516602 25 20230514 00:45:04 2 260.943359 28 20230514 00:45:14 2 22514.5645 152 20230514 00:46:36 2 32874.1875 193 20230514 00:46:47 2 25770.332 188 20230514 00:47:00 2 33096.7656 136 20230514 00:47:18 2 35927.2803 135 20230514 00:47:33 2 38246.7305 135 20230514 00:47:45 2 35705.1396 122 20230514 00:47:57 2 40689.6309 115 20230514 00:48:07 2 40523.1152 111 20230514 00:48:17 2 41626.7842 110 20230514 00:48:27 2 33751.5313 111 20230514 00:48:37 2 28896.8965 72 20230514 00:48:47 2 26450.3477 57 20230514 00:48:57 2 22788.7998 47 20230514 00:49:07 2 23653.2305 41 20230514 00:49:17 2 23028.5977 30 20230514 00:49:27 2 22086.1943 26 20230514 00:49:37 2 17639.7178 14 20230514 00:49:47 2 6296.36719 7 20230514 00:49:57 2 1196.28418 5
Note:
活动会话也有增加。
PGA使用总量增加了,并发活动会话增加了,使用PGA 最多的TOP 20 session 问题时间点都在执行一个2cpwnwjvuzxx8 SQLID, 该SQL的从执行计划看效率很高,索引选择率好,个人感觉PGA使用高一种是少量个数但单个使用消耗PGA较多,另一种上执行变慢导致会话积压,我觉的可能更倾向于第二种, 以执行2cpwnwjvuzxx8的会话单个使用2GB, 25G的PGA limit有10几个会话增长起来就可能达到。 因为该问题持续时间短,从AWR的sqlstat粒度可能不好判断原因。 无论是当时DASH只使用最多PGA的会话在执行的SQL,还是当时活动会话在执行的TOP SQL都指向2cpwnwjvuzxx8, 这个SQL是在SQL的执行阶段,但活动会话又是在等待cursor: mutex x SQL游标解析类,而且还是high version,什么情况会在执行中产生子游标呢? 自适应共享游标Adaptive Cursor Sharing ACS,ECS
为什么游标不共享
SELECT INST_ID, SQL_ID, REASON_NAME, REASON_VALUE, COUNT(*) FROM ( SELECT * FROM GV$SQL_SHARED_CURSOR UNPIVOT INCLUDE NULLS (REASON_VALUE FOR REASON_NAME IN (ACL_MISMATCH AS 'ACL_MISMATCH',ANYDATA_TRANSFORMATION AS 'ANYDATA_TRANSFORMATION',AUTH_CHECK_MISMATCH AS 'AUTH_CHECK_MISMATCH',BIND_EQUIV_FAILURE AS 'BIND_EQUIV_FAILURE',BIND_LENGTH_UPGRADEABLE AS 'BIND_LENGTH_UPGRADEABLE',BIND_MISMATCH AS 'BIND_MISMATCH',BIND_PEEKED_PQ_MISMATCH AS 'BIND_PEEKED_PQ_MISMATCH',BIND_UACS_DIFF AS 'BIND_UACS_DIFF',BUFFERED_DML_MISMATCH AS 'BUFFERED_DML_MISMATCH',CROSSEDITION_TRIGGER_MISMATCH AS 'CROSSEDITION_TRIGGER_MISMATCH',CURSOR_PARTS_MISMATCH AS 'CURSOR_PARTS_MISMATCH',DESCRIBE_MISMATCH AS 'DESCRIBE_MISMATCH',DIFF_CALL_DURN AS 'DIFF_CALL_DURN',DIFFERENT_LONG_LENGTH AS 'DIFFERENT_LONG_LENGTH',EDITION_MISMATCH AS 'EDITION_MISMATCH',EXPLAIN_PLAN_CURSOR AS 'EXPLAIN_PLAN_CURSOR',FLASHBACK_ARCHIVE_MISMATCH AS 'FLASHBACK_ARCHIVE_MISMATCH',FLASHBACK_CURSOR AS 'FLASHBACK_CURSOR',FLASHBACK_TABLE_MISMATCH AS 'FLASHBACK_TABLE_MISMATCH',FORCE_HARD_PARSE AS 'FORCE_HARD_PARSE',HASH_MATCH_FAILED AS 'HASH_MATCH_FAILED',INCOMP_LTRL_MISMATCH AS 'INCOMP_LTRL_MISMATCH',INST_DRTLD_MISMATCH AS 'INST_DRTLD_MISMATCH',INSUFF_PRIVS AS 'INSUFF_PRIVS',INSUFF_PRIVS_REM AS 'INSUFF_PRIVS_REM',LANGUAGE_MISMATCH AS 'LANGUAGE_MISMATCH',LITERAL_MISMATCH AS 'LITERAL_MISMATCH',LITREP_COMP_MISMATCH AS 'LITREP_COMP_MISMATCH',LOAD_OPTIMIZER_STATS AS 'LOAD_OPTIMIZER_STATS',LOAD_RUNTIME_HEAP_FAILED AS 'LOAD_RUNTIME_HEAP_FAILED',LOCK_USER_SCHEMA_FAILED AS 'LOCK_USER_SCHEMA_FAILED',LOGICAL_STANDBY_APPLY AS 'LOGICAL_STANDBY_APPLY',LOGMINER_SESSION_MISMATCH AS 'LOGMINER_SESSION_MISMATCH',MULTI_PX_MISMATCH AS 'MULTI_PX_MISMATCH',MV_QUERY_GEN_MISMATCH AS 'MV_QUERY_GEN_MISMATCH',MV_REWRITE_MISMATCH AS 'MV_REWRITE_MISMATCH',MV_STALEOBJ_MISMATCH AS 'MV_STALEOBJ_MISMATCH',NO_TRIGGER_MISMATCH AS 'NO_TRIGGER_MISMATCH',OPTIMIZER_MISMATCH AS 'OPTIMIZER_MISMATCH',OPTIMIZER_MODE_MISMATCH AS 'OPTIMIZER_MODE_MISMATCH',OUTLINE_MISMATCH AS 'OUTLINE_MISMATCH',OVERLAP_TIME_MISMATCH AS 'OVERLAP_TIME_MISMATCH',PDDL_ENV_MISMATCH AS 'PDDL_ENV_MISMATCH',PDML_ENV_MISMATCH AS 'PDML_ENV_MISMATCH',PLSQL_CMP_SWITCHS_DIFF AS 'PLSQL_CMP_SWITCHS_DIFF',PLSQL_DEBUG AS 'PLSQL_DEBUG',PQ_SLAVE_MISMATCH AS 'PQ_SLAVE_MISMATCH',PURGED_CURSOR AS 'PURGED_CURSOR',PX_MISMATCH AS 'PX_MISMATCH',REMOTE_MAPPING_MISMATCH AS 'REMOTE_MAPPING_MISMATCH',REMOTE_TRANS_MISMATCH AS 'REMOTE_TRANS_MISMATCH',ROLL_INVALID_MISMATCH AS 'ROLL_INVALID_MISMATCH',SLAVE_QC_MISMATCH AS 'SLAVE_QC_MISMATCH',SQL_TYPE_MISMATCH AS 'SQL_TYPE_MISMATCH',STATS_ROW_MISMATCH AS 'STATS_ROW_MISMATCH',STB_OBJECT_MISMATCH AS 'STB_OBJECT_MISMATCH',TOP_LEVEL_DDL_MISMATCH AS 'TOP_LEVEL_DDL_MISMATCH',TOP_LEVEL_RPI_CURSOR AS 'TOP_LEVEL_RPI_CURSOR',TRANSLATION_MISMATCH AS 'TRANSLATION_MISMATCH',TYPCHK_DEP_MISMATCH AS 'TYPCHK_DEP_MISMATCH',TYPECHECK_MISMATCH AS 'TYPECHECK_MISMATCH',UNBOUND_CURSOR AS 'UNBOUND_CURSOR',USE_FEEDBACK_STATS AS 'USE_FEEDBACK_STATS',USER_BIND_PEEK_MISMATCH AS 'USER_BIND_PEEK_MISMATCH')) ORDER BY 2,1) WHERE SQL_ID='&sqlid' GROUP BY INST_ID, SQL_ID, REASON_NAME, REASON_VALUE HAVING REASON_VALUE='Y' ORDER BY COUNT(*) DESC;
note:
查询上面的问题SQL version count 高和no shared 原因为BIND_EQUIV_FAILURE, 这是ACS(实际上是ECS)的典型,11g 12c 19c最佳实践一直在禁用该特性, 12.2 由于 Bind_equiv_failure 引发 SQL 不能共享进而造成 Cursor Mutex: x (Doc ID 2610645.1)提到的bug也是ACS导致的.
查询v$sql的is_bind_sensitive,is_bind_aware 并且也都为Y。ACS问题在13年左右第一次遇到时也是一个1秒的SQL执行时间突然变成了4分钟。
禁用ACS
alter system set "_optimizer_use_feedback"=false; alter system set "_optimizer_adaptive_cursor_sharing"=false; alter system set "_optimizer_extended_cursor_sharing"=none ; alter system set "_optimizer_extended_cursor_sharing_rel"=none;
note:
另外禁用绑定变量窥探(set “_optim_peek_user_binds”=false)也可以变象实现禁用ACS.
对不起,这篇文章暂时关闭评论。