Troubleshooting: Oracle 19c wait event ‘latch: MGA shared context root latch’
最近一套客户环境从12C 升级19c(19.9)后在我们的监控上Wait class “Concurrency”等待告警进入TOP, 但是从V$SESSION和ASH view中的活动(active)的会话并未显示这么严重等待, AWR的top event中同样也不存在, 但是算在了Wait Classes by Total Wait Time和Background Wait Events里面,具体的wait event为“latch: MGA shared context root latch” , 1小时的AWR些latch wait 时间高达55,141秒,开始Oracle反馈这应该只是AWR数据计算错误,但其它相同版本环境并没有这么明显,监控页面天天“红”,又不是股市大盘,不能忍,所以想再继续探索一下。
Top 10 Foreground Events by Total Wait Time
Event | Waits | Total Wait Time (sec) | Avg Wait | % DB time | Wait Class |
---|---|---|---|---|---|
DB CPU | 84.3K | 62.9 | |||
SQL*Net message from dblink | 6,196,574 | 12.3K | 1.99ms | 9.2 | Network |
db file sequential read | 9,531,039 | 5091.2 | 534.17us | 3.8 | User I/O |
SQL*Net more data from dblink | 1,590,424 | 2469.4 | 1.55ms | 1.8 | Network |
log file sync | 2,854,921 | 2110.8 | 739.36us | 1.6 | Commit |
db file scattered read | 1,210,515 | 1156.1 | .96ms | .9 | User I/O |
gc cr grant 2-way | 4,635,419 | 628.8 | 135.66us | .5 | Cluster |
enq: TX – row lock contention | 930 | 495.4 | 532.68ms | .4 | Application |
db file parallel read | 330,082 | 421 | 1.28ms | .3 | User I/O |
read by other session | 336,075 | 395.4 | 1.18ms | .3 | User I/O |
Wait Classes by Total Wait Time
Wait Class | Waits | Total Wait Time (sec) | Avg Wait Time | % DB time | Avg Active Sessions |
---|---|---|---|---|---|
DB CPU | 84,336 | 62.9 | 23.3 | ||
Concurrency | 1,545,430 | 55,513 | 35.92ms | 41.4 | 15.3 |
Network | 258,734,071 | 15,173 | 58.64us | 11.3 | 4.2 |
User I/O | 12,480,920 | 7,263 | 581.95us | 5.4 | 2.0 |
Commit | 2,854,926 | 2,111 | 739.36us | 1.6 | 0.6 |
Cluster | 10,029,048 | 1,921 | 191.53us | 1.4 | 0.5 |
System I/O | 4,798,097 | 1,915 | 399.07us | 1.4 | 0.5 |
Other | 14,458,260 | 1,380 | 95.47us | 1.0 | 0.4 |
Application | 10,135 | 500 | 49.36ms | .4 | 0.1 |
Configuration | 3,085 | 3 | 1.07ms | .0 | 0.0 |
Host CPU
CPUs | Cores | Sockets | Load Average Begin | Load Average End | %User | %System | %WIO | %Idle |
---|---|---|---|---|---|---|---|---|
136 | 80 | 4 | 34.55 | 48.96 | 16.3 | 6.0 | 1.1 | 77.2 |
Background Wait Events
Event | Waits | %Time -outs | Total Wait Time (s) | Avg wait | Waits /txn | % bg time |
---|---|---|---|---|---|---|
latch: MGA shared context root latch | 227,710 | 0 | 55,141 | 242.15ms | 0.06 | 1108.62 |
log file parallel write | 2,841,117 | 0 | 1,422 | 500.37us | 0.75 | 28.58 |
latch free | 434,003 | 0 | 439 | 1.01ms | 0.11 | 8.82 |
db file parallel write | 666,437 | 0 | 419 | 628.45us | 0.18 | 8.42 |
DLM cross inst call completion | 515,577 | 50 | 226 | 438.96us | 0.14 | 4.55 |
latch: MGA shared context latch | 144,990 | 0 | 103 | 708.93us | 0.04 | 2.07 |
主要还是latch: MGA shared context root latch, 对于这个latch 还是头一次见,对于Oracle我们都知道SGA, PGA, UGA, 但是MGA 以是什么东东?
什么是MGA?
对于MGA的资料并不是很多,首先是MOS上的MGA (Managed Global Area) Reference Note (Doc ID 2638904.1),和Mike.Dietrich在他的BLOG上介绍了一些, MGA是从12.1时先引入,在非XD环境中是从18c对外开放,但未在v$pgastat中显示具体值,从19c开始可以从v$pgastat 视图查看MGA大小。MGA的目的是优化共享内存, MGA 允许“较小的一组”进程(甚至所有进程)在持续时间内(通常是查询)共享地址空间(namespace)。MGA 由包含segment和heap(KGH)的namespace组成,MOS上举例如:
Processes can attach to their namespace for the duration they require. For e.g., in a parallel query environment, PQs participating with the QC can share the namespace from QC to share the results. Once done, they will detach from the same and the namespace can be tiered down.
MGA又不同于SGA和PGA,他是弹性按需创建,进程可以动态的创建和删除MGA segment,同时又可以在部分和全部进程间共享。大小是计入PGA(v$pgastat),又独立于PGA, 同时也受PGA的限制(PGA_AGGREGATE_LIMIT), 所以在升级19c(18c)后需要增加额外的PGA大小,上一篇笔记关于PGA_AGGREGATE_LIMIT中有记录.
因为MGA是共享内存区,所以需要使用latch: MGA 来控制访问和保护。相关的几个event:
SQL> @sed mga Show wait event descriptions matching %mga%.. EVENT# EVENT_NAME WAIT_CLASS PARAMETER1 PARAMETER2 PARAMETER3 ENQUEUE_NAME REQ_REASON REQ_DESCRIPTION ------ ------------------------------------------------------- -------------------- ------------------------- ------------------------- ------------------------- ------------------------------ -------------------------------- ---------------------------------------------------------------------------------------------------- 1520 enq: MG - MGA allocation Other name|mode qcserial# rowsource#/mgacode# MGA service MGA allocation synchronize MGA service MGA allocation 91 latch: MGA asr alloc latch Concurrency address number why 89 latch: MGA heap latch Concurrency address number why 90 latch: MGA pid alloc latch Concurrency address number why 88 latch: MGA shared context latch Concurrency address number why 87 latch: MGA shared context root latch Concurrency address number why 6 rows selected.
MGA当前版本用于 IPC, MGA hash join, IMCDT and PQ 。查看MGA的使用:
SQL> @pga --v$pgastat Show PGA manager stats from V$PGASTAT NAME VALUE UNIT ---------------------------------------------------------------- ---------- ------------ aggregate PGA target parameter 81920 megabytes aggregate PGA auto target 5120 megabytes global memory bound 1024 megabytes total PGA inuse 86163 megabytes total PGA allocated 96262 megabytes maximum PGA allocated 100310 megabytes total freeable PGA memory 4855 megabytes MGA allocated (under PGA) 35328 megabytes maximum MGA allocated 35328 megabytes process count 16424 max processes count 17707 PGA memory freed back to OS 25515888 megabytes oracle@anbob:/dev/shm> ls -lrth ... -rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_58.dat -rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_62.dat -rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_22.dat -rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_51.dat -rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_45.dat -rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_2.dat -rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_18.dat -rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_23.dat -rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_65.dat -rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_8.dat -rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_30.dat oracle@anbob:/dev/shm> ls -l |grep MGA|awk '{sum+=$5} END{print "TOTAL:"sum/1024/1024"M"}' TOTAL:36352M
MGA参数
SQL> @pd mga Show all parameters and session values from x$ksppi/x$ksppcv... NUM N_HEX NAME VALUE DESCRIPTION ---------- ----- -------------------------------------------------------- ------------------------------ --------------------------------------------------------------------------------------------------- 725 2D5 _ksipc_mga_segment_size 268435456 KSIPC MGA Segment Size 955 3BB _use_large_pages_for_mga FALSE MGA largepage enabled 956 3BC _use_fallocate_for_mga FALSE MGA fallocate enabled 957 3BD _mga_large_page_path large page path
案例分析
SQL> select * from V$SYSTEM_WAIT_CLASS where wait_class<>'Idle' order by 5 desc;
WAIT_CLASS_ID WAIT_CLASS# WAIT_CLASS TOTAL_WAITS TIME_WAITED TOTAL_WAITS_FG TIME_WAITED_FG CON_ID
------------- ----------- ---------------------------------------------------------------- ----------- ----------- -------------- -------------- ----------
3875070507 4 Concurrency 189969317 371099593 22148484 3826258 0
2000153315 7 Network 2.9628E+10 294762575 2.9537E+10 283411507 0
1740759767 8 User I/O 1897773868 101487918 1868899105 101240375 0
4108307767 9 System I/O 477500751 22212382 22841680 1245767 0
3871361733 11 Cluster 1161500695 21242636 1139436884 20938719 0
4217450380 1 Application 1555495 18714528 1521597 18708864 0
1893977003 0 Other 1741170608 17534611 1433918686 6984471 0
3386400367 5 Commit 217456799 16820631 217456206 16820600 0
3290255840 2 Configuration 277824 63263 264527 62622 0
4166625743 3 Administrative 461 1132 412 1106 0
SQL> select event,total_waits,TIME_WAITED,AVERAGE_WAIT,TOTAL_WAITS_FG,TIME_WAITED_FG from v$system_event where WAIT_CLASS# =4 order by 3 desc;
EVENT TOTAL_WAITS TIME_WAITED AVERAGE_WAIT TOTAL_WAITS_FG TIME_WAITED_FG
---------------------------------------- ----------- ----------- ------------ -------------- --------------
latch: MGA shared context root latch 29303750 365956501 12.49 0 0
library cache lock 9507250 1937308 .2 8470758 1908916
latch: MGA shared context latch 20077941 1349800 .07 2320 7937
row cache lock 485870 857088 1.76 339737 596077
enq: TX - index contention 1528387 498004 .33 1525545 497924
buffer busy waits 4431987 213996 .05 4371371 213241
...
25 rows selected.
SQL> select * from v$event_histogram where event='latch: MGA shared context root latch';
EVENT# EVENT WAIT_TIME_MILLI WAIT_COUNT LAST_UPDATE_TIME CON_ID
---------- ---------------------------------------- --------------- ---------- ---------------------------------------------------------------- ----------
87 latch: MGA shared context root latch 1 2165473 15-JUL-21 11.23.21.465320 AM +08:00 0
87 latch: MGA shared context root latch 2 3307116 15-JUL-21 11.23.21.946395 AM +08:00 0
87 latch: MGA shared context root latch 4 4153077 15-JUL-21 11.23.21.939756 AM +08:00 0
87 latch: MGA shared context root latch 8 4060755 15-JUL-21 11.23.21.808073 AM +08:00 0
87 latch: MGA shared context root latch 16 2617221 15-JUL-21 11.23.21.706092 AM +08:00 0
87 latch: MGA shared context root latch 32 1950897 15-JUL-21 11.23.18.110130 AM +08:00 0
87 latch: MGA shared context root latch 64 1989245 15-JUL-21 11.23.18.064262 AM +08:00 0
87 latch: MGA shared context root latch 128 2249583 15-JUL-21 11.23.13.786579 AM +08:00 0
87 latch: MGA shared context root latch 256 2536114 15-JUL-21 11.23.03.577469 AM +08:00 0
87 latch: MGA shared context root latch 512 2306337 15-JUL-21 11.23.03.258901 AM +08:00 0
87 latch: MGA shared context root latch 1024 1443628 15-JUL-21 11.20.20.323152 AM +08:00 0
87 latch: MGA shared context root latch 2048 443235 15-JUL-21 11.20.13.886305 AM +08:00 0
87 latch: MGA shared context root latch 4096 82319 15-JUL-21 11.00.55.409740 AM +08:00 0
87 latch: MGA shared context root latch 8192 5404 15-JUL-21 11.00.47.264496 AM +08:00 0
14 rows selected.
SQL> SELECT EVENT,P1,P2,COUNT(*),MIN(SAMPLE_TIME),MAX(SAMPLE_TIME) FROM V$ACTIVE_SESSION_HISTORY WHERE EVENT LIKE '%MGA%' group by EVENT,P1,P2;
EVENT P1 P2 COUNT(*) MIN(SAMPLE_TIME) MAX(SAMPLE_TIME)
---------------------------------------- ---------- ---------- ---------- --------------------------------------------------------------------------- ---------------------------------------------------------------------------
latch: MGA shared context latch 4.8616E+10 174 1 15-JUL-21 04.18.39.528 AM 15-JUL-21 04.18.39.528 AM
latch: MGA shared context latch 4.8616E+10 174 1 15-JUL-21 09.03.06.210 AM 15-JUL-21 09.03.06.210 AM
latch: MGA shared context root latch 1611025984 173 23 15-JUL-21 12.45.04.553 AM 15-JUL-21 11.00.48.798 AM
SQL> select latch#,type,name,gets,misses,IMMEDIATE_MISSES,sleeps,spin_gets from v$latch where name like '%MGA%';
LATCH# NAME GETS MISSES IMMEDIATE_MISSES SLEEPS SPIN_GETS
---------- ---------------------------------------------------------------- ---------- ---------- ---------------- ---------- ----------
173 MGA shared context root latch 92649693 27452901 0 50996639 192780
174 MGA shared context latch 858046457 21458639 0 25812111 1342877
175 MGA heap latch 89743211 962829 0 300074 699783
176 MGA pid alloc latch 0 0 0 0 0
177 MGA asr alloc latch 9266 0 0 0 0
742 MGA service latch 0 0 0 0 0
6 rows selected.
SQL> select name,gets,misses,IMMEDIATE_MISSES from v$latch_parent where latch#=173;
NAME GETS MISSES IMMEDIATE_MISSES
---------------------------------------------------------------- ---------- ---------- ----------------
MGA shared context root latch 94598513 28088359 0
SQL> select "WHERE",NWFAIL_COUNT, SLEEP_COUNT, WTR_SLP_COUNT, LONGHOLD_COUNT from v$latch_misses where PARENT_NAME='MGA shared context root latch' AND SLEEP_COUNT >0 order by 4 desc;
WHERE NWFAIL_COUNT SLEEP_COUNT WTR_SLP_COUNT LONGHOLD_COUNT
------------------------------------------------------------ ------------ ----------- ------------- --------------
ksm_mga_detach_whr1 0 26337704 27097864 0
ksm_mga_attach_whr1 0 25880704 25116290 0
ksm_mga_seg_cnt_whr1 0 347 53066 253
ksm_mga_create_whr1 0 1604 5847 14
Note:
当前看主要发生在shared context root latch 上,也可能root latch就单个,所以争用较为集中,如发生成进程初始化阶段,后期版本有可能增加并行,从where的2个函数猜测 ksm_mga_detach_whr1应该是在session logout, 那ksm_mga_attach_whr1 应该是session logon. 那就有可能是连接风暴导致,看一下登录
Snap Id | Snap Time | Sessions | Cursors/Session | Instances | |
---|---|---|---|---|---|
Begin Snap: | 471 | 12-Jul-21 09:00:07 | 16405 | 3.3 | 2 |
End Snap: | 472 | 12-Jul-21 10:00:24 | 16555 | 3.7 | 2 |
Elapsed: | 60.29 (mins) | ||||
DB Time: | 2,234.57 (mins) |
Load Profile
Per Second | Per Transaction | Per Exec | Per Call | |
---|---|---|---|---|
DB Time(s): | 37.1 | 0.0 | 0.00 | 0.00 |
DB CPU(s): | 23.3 | 0.0 | 0.00 | 0.00 |
Background CPU(s): | 1.0 | 0.0 | 0.00 | 0.00 |
SQL Work Area (MB): | 151.2 | 0.1 | ||
Logons: | 44.8 | 0.0 | ||
User logons: | 44.5 | 0.0 | ||
Executes (SQL): | 34,217.2 | 32.5 | ||
Rollbacks: | 271.7 | 0.3 | ||
Transactions: | 1,052.2 |
Note:
当前这个库的会话数较高约达17k, 同时每秒连接LOGON为40也不低, 目前也就这两个指标相比其它同版本环境较高,且只有这个节点问题较为突出, 判断与较高的短连接有关,我们在其它环境初始化了15K会话数,再以40每秒的短连接测试,也可以测试出latch: MGA shared context root latch 快速增加,但未能模拟出像生产相同的现象。
— UPDATE 2022/02/09 —
在oracle 19.12中已定义为 Bug 33352794 HIGH WAIT ON ‘LATCH: MGA SHARED CONTEXT ROOT LATCH’ AND ‘LATCH: MGA SHARED CONTEXT LATCH’, 并在19.13 RU中修复.
AWR reports show waits for MGA latch contention. If high wait on 'latch: MGA shared context root latch' and 'latch: MGA shared context latch' is noticed this bug might be hit.
对不起,这篇文章暂时关闭评论。