首页 » Cloud, ORACLE 9i-23ai » Troubleshooting: Oracle 19c wait event ‘latch: MGA shared context root latch’

Troubleshooting: Oracle 19c wait event ‘latch: MGA shared context root latch’

最近一套客户环境从12C 升级19c(19.9)后在我们的监控上Wait class “Concurrency”等待告警进入TOP,  但是从V$SESSION和ASH view中的活动(active)的会话并未显示这么严重等待, AWR的top event中同样也不存在, 但是算在了Wait Classes by Total Wait Time和Background Wait Events里面,具体的wait event为“latch: MGA shared context root latch” , 1小时的AWR些latch wait 时间高达55,141秒,开始Oracle反馈这应该只是AWR数据计算错误,但其它相同版本环境并没有这么明显,监控页面天天“红”,又不是股市大盘,不能忍,所以想再继续探索一下。

Top 10 Foreground Events by Total Wait Time

Event Waits Total Wait Time (sec) Avg Wait % DB time Wait Class
DB CPU 84.3K 62.9
SQL*Net message from dblink 6,196,574 12.3K 1.99ms 9.2 Network
db file sequential read 9,531,039 5091.2 534.17us 3.8 User I/O
SQL*Net more data from dblink 1,590,424 2469.4 1.55ms 1.8 Network
log file sync 2,854,921 2110.8 739.36us 1.6 Commit
db file scattered read 1,210,515 1156.1 .96ms .9 User I/O
gc cr grant 2-way 4,635,419 628.8 135.66us .5 Cluster
enq: TX – row lock contention 930 495.4 532.68ms .4 Application
db file parallel read 330,082 421 1.28ms .3 User I/O
read by other session 336,075 395.4 1.18ms .3 User I/O

Wait Classes by Total Wait Time

Wait Class Waits Total Wait Time (sec) Avg Wait Time % DB time Avg Active Sessions
DB CPU 84,336 62.9 23.3
Concurrency 1,545,430 55,513 35.92ms 41.4 15.3
Network 258,734,071 15,173 58.64us 11.3 4.2
User I/O 12,480,920 7,263 581.95us 5.4 2.0
Commit 2,854,926 2,111 739.36us 1.6 0.6
Cluster 10,029,048 1,921 191.53us 1.4 0.5
System I/O 4,798,097 1,915 399.07us 1.4 0.5
Other 14,458,260 1,380 95.47us 1.0 0.4
Application 10,135 500 49.36ms .4 0.1
Configuration 3,085 3 1.07ms .0 0.0

Host CPU

CPUs Cores Sockets Load Average Begin Load Average End %User %System %WIO %Idle
136 80 4 34.55 48.96 16.3 6.0 1.1 77.2

 

Background Wait Events

Event Waits %Time -outs Total Wait Time (s) Avg wait Waits /txn % bg time
latch: MGA shared context root latch 227,710 0 55,141 242.15ms 0.06 1108.62
log file parallel write 2,841,117 0 1,422 500.37us 0.75 28.58
latch free 434,003 0 439 1.01ms 0.11 8.82
db file parallel write 666,437 0 419 628.45us 0.18 8.42
DLM cross inst call completion 515,577 50 226 438.96us 0.14 4.55
latch: MGA shared context latch 144,990 0 103 708.93us 0.04 2.07

主要还是latch: MGA shared context root latch, 对于这个latch 还是头一次见,对于Oracle我们都知道SGA, PGA, UGA, 但是MGA 以是什么东东?

什么是MGA?

对于MGA的资料并不是很多,首先是MOS上的MGA (Managed Global Area) Reference Note (Doc ID 2638904.1),和在他的BLOG上介绍了一些, MGA是从12.1时先引入,在非XD环境中是从18c对外开放,但未在v$pgastat中显示具体值,从19c开始可以从v$pgastat 视图查看MGA大小。MGA的目的是优化共享内存, MGA 允许“较小的一组”进程(甚至所有进程)在持续时间内(通常是查询)共享地址空间(namespace)。MGA 由包含segment和heap(KGH)的namespace组成,MOS上举例如:

Processes can attach to their namespace for the duration they require.  For e.g., in a parallel query environment, PQs participating with the QC can share the namespace from QC to share the results.  Once done, they will detach from the same and the namespace can be tiered down.

MGA又不同于SGA和PGA,他是弹性按需创建,进程可以动态的创建和删除MGA segment,同时又可以在部分和全部进程间共享。大小是计入PGA(v$pgastat),又独立于PGA,  同时也受PGA的限制(PGA_AGGREGATE_LIMIT), 所以在升级19c(18c)后需要增加额外的PGA大小,上一篇笔记关于PGA_AGGREGATE_LIMIT中有记录.

因为MGA是共享内存区,所以需要使用latch: MGA 来控制访问和保护。相关的几个event:

SQL> @sed mga
Show wait event descriptions matching %mga%..

EVENT# EVENT_NAME                                              WAIT_CLASS           PARAMETER1                PARAMETER2                PARAMETER3                ENQUEUE_NAME                   REQ_REASON                       REQ_DESCRIPTION
------ ------------------------------------------------------- -------------------- ------------------------- ------------------------- ------------------------- ------------------------------ -------------------------------- ----------------------------------------------------------------------------------------------------
  1520 enq: MG - MGA allocation                                Other                name|mode                 qcserial#                 rowsource#/mgacode#       MGA service                    MGA allocation                   synchronize MGA service MGA allocation
    91 latch: MGA asr alloc latch                              Concurrency          address                   number                    why
    89 latch: MGA heap latch                                   Concurrency          address                   number                    why
    90 latch: MGA pid alloc latch                              Concurrency          address                   number                    why
    88 latch: MGA shared context latch                         Concurrency          address                   number                    why
    87 latch: MGA shared context root latch                    Concurrency          address                   number                    why

6 rows selected.

MGA当前版本用于  IPC, MGA hash join, IMCDT and PQ 。查看MGA的使用:

SQL> @pga --v$pgastat
Show PGA manager stats from V$PGASTAT

NAME                                                                  VALUE UNIT
---------------------------------------------------------------- ---------- ------------
aggregate PGA target parameter                                        81920 megabytes
aggregate PGA auto target                                              5120 megabytes
global memory bound                                                    1024 megabytes
total PGA inuse                                                       86163 megabytes
total PGA allocated                                                   96262 megabytes
maximum PGA allocated                                                100310 megabytes
total freeable PGA memory                                              4855 megabytes
MGA allocated (under PGA)                                             35328 megabytes
maximum MGA allocated                                                 35328 megabytes
process count                                                         16424
max processes count                                                   17707
PGA memory freed back to OS                                        25515888 megabytes

oracle@anbob:/dev/shm> ls -lrth
...
-rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_58.dat
-rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_62.dat
-rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_22.dat
-rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_51.dat
-rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_45.dat
-rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_2.dat
-rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_18.dat
-rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_23.dat
-rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_65.dat
-rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_8.dat
-rw-r----- 1 oracle asmadmin 512M Jul 12 14:34 ora_6a83e788_1306ed18_1_KSIPC_MGA_NMSPC_1_0_30.dat

oracle@anbob:/dev/shm> ls -l |grep MGA|awk '{sum+=$5} END{print "TOTAL:"sum/1024/1024"M"}'
TOTAL:36352M

MGA参数

SQL> @pd mga
Show all parameters and session values from x$ksppi/x$ksppcv...

       NUM N_HEX NAME                                                     VALUE                          DESCRIPTION
---------- ----- -------------------------------------------------------- ------------------------------ ---------------------------------------------------------------------------------------------------
       725   2D5 _ksipc_mga_segment_size                                  268435456                      KSIPC MGA Segment Size
       955   3BB _use_large_pages_for_mga                                 FALSE                          MGA largepage enabled
       956   3BC _use_fallocate_for_mga                                   FALSE                          MGA fallocate enabled
       957   3BD _mga_large_page_path                                                                    large page path

案例分析

SQL> select * from V$SYSTEM_WAIT_CLASS where wait_class<>'Idle' order by 5 desc;

WAIT_CLASS_ID WAIT_CLASS# WAIT_CLASS                                                       TOTAL_WAITS TIME_WAITED TOTAL_WAITS_FG TIME_WAITED_FG     CON_ID
------------- ----------- ---------------------------------------------------------------- ----------- ----------- -------------- -------------- ----------
   3875070507           4 Concurrency                                                        189969317   371099593       22148484        3826258          0
   2000153315           7 Network                                                           2.9628E+10   294762575     2.9537E+10      283411507          0
   1740759767           8 User I/O                                                          1897773868   101487918     1868899105      101240375          0
   4108307767           9 System I/O                                                         477500751    22212382       22841680        1245767          0
   3871361733          11 Cluster                                                           1161500695    21242636     1139436884       20938719          0
   4217450380           1 Application                                                          1555495    18714528        1521597       18708864          0
   1893977003           0 Other                                                             1741170608    17534611     1433918686        6984471          0
   3386400367           5 Commit                                                             217456799    16820631      217456206       16820600          0
   3290255840           2 Configuration                                                         277824       63263         264527          62622          0
   4166625743           3 Administrative                                                           461        1132            412           1106          0
   
SQL> select event,total_waits,TIME_WAITED,AVERAGE_WAIT,TOTAL_WAITS_FG,TIME_WAITED_FG from v$system_event where  WAIT_CLASS# =4 order by 3 desc;

EVENT                                    TOTAL_WAITS TIME_WAITED AVERAGE_WAIT TOTAL_WAITS_FG TIME_WAITED_FG
---------------------------------------- ----------- ----------- ------------ -------------- --------------
latch: MGA shared context root latch        29303750   365956501        12.49              0              0
library cache lock                           9507250     1937308           .2        8470758        1908916
latch: MGA shared context latch             20077941     1349800          .07           2320           7937
row cache lock                                485870      857088         1.76         339737         596077
enq: TX - index contention                   1528387      498004          .33        1525545         497924
buffer busy waits                            4431987      213996          .05        4371371         213241
...

25 rows selected.

SQL> select * from v$event_histogram where event='latch: MGA shared context root latch';

    EVENT# EVENT                                    WAIT_TIME_MILLI WAIT_COUNT LAST_UPDATE_TIME                                                     CON_ID
---------- ---------------------------------------- --------------- ---------- ---------------------------------------------------------------- ----------
        87 latch: MGA shared context root latch                   1    2165473 15-JUL-21 11.23.21.465320 AM +08:00                                       0
        87 latch: MGA shared context root latch                   2    3307116 15-JUL-21 11.23.21.946395 AM +08:00                                       0
        87 latch: MGA shared context root latch                   4    4153077 15-JUL-21 11.23.21.939756 AM +08:00                                       0
        87 latch: MGA shared context root latch                   8    4060755 15-JUL-21 11.23.21.808073 AM +08:00                                       0
        87 latch: MGA shared context root latch                  16    2617221 15-JUL-21 11.23.21.706092 AM +08:00                                       0
        87 latch: MGA shared context root latch                  32    1950897 15-JUL-21 11.23.18.110130 AM +08:00                                       0
        87 latch: MGA shared context root latch                  64    1989245 15-JUL-21 11.23.18.064262 AM +08:00                                       0
        87 latch: MGA shared context root latch                 128    2249583 15-JUL-21 11.23.13.786579 AM +08:00                                       0
        87 latch: MGA shared context root latch                 256    2536114 15-JUL-21 11.23.03.577469 AM +08:00                                       0
        87 latch: MGA shared context root latch                 512    2306337 15-JUL-21 11.23.03.258901 AM +08:00                                       0
        87 latch: MGA shared context root latch                1024    1443628 15-JUL-21 11.20.20.323152 AM +08:00                                       0
        87 latch: MGA shared context root latch                2048     443235 15-JUL-21 11.20.13.886305 AM +08:00                                       0
        87 latch: MGA shared context root latch                4096      82319 15-JUL-21 11.00.55.409740 AM +08:00                                       0
        87 latch: MGA shared context root latch                8192       5404 15-JUL-21 11.00.47.264496 AM +08:00                                       0

14 rows selected.

SQL> SELECT EVENT,P1,P2,COUNT(*),MIN(SAMPLE_TIME),MAX(SAMPLE_TIME) FROM V$ACTIVE_SESSION_HISTORY WHERE EVENT LIKE '%MGA%' group by EVENT,P1,P2;

EVENT                                            P1         P2   COUNT(*) MIN(SAMPLE_TIME)                                                            MAX(SAMPLE_TIME)
---------------------------------------- ---------- ---------- ---------- --------------------------------------------------------------------------- ---------------------------------------------------------------------------
latch: MGA shared context latch          4.8616E+10        174          1 15-JUL-21 04.18.39.528 AM                                                   15-JUL-21 04.18.39.528 AM
latch: MGA shared context latch          4.8616E+10        174          1 15-JUL-21 09.03.06.210 AM                                                   15-JUL-21 09.03.06.210 AM
latch: MGA shared context root latch     1611025984        173         23 15-JUL-21 12.45.04.553 AM                                                   15-JUL-21 11.00.48.798 AM

SQL> select latch#,type,name,gets,misses,IMMEDIATE_MISSES,sleeps,spin_gets from v$latch where name like '%MGA%';

    LATCH# NAME                                                                   GETS     MISSES IMMEDIATE_MISSES     SLEEPS  SPIN_GETS
---------- ---------------------------------------------------------------- ---------- ---------- ---------------- ---------- ----------
       173 MGA shared context root latch                                      92649693   27452901                0   50996639     192780
       174 MGA shared context latch                                          858046457   21458639                0   25812111    1342877
       175 MGA heap latch                                                     89743211     962829                0     300074     699783
       176 MGA pid alloc latch                                                       0          0                0          0          0
       177 MGA asr alloc latch                                                    9266          0                0          0          0
       742 MGA service latch                                                         0          0                0          0          0

6 rows selected.

SQL> select name,gets,misses,IMMEDIATE_MISSES from v$latch_parent where latch#=173;

NAME                                                                   GETS     MISSES IMMEDIATE_MISSES
---------------------------------------------------------------- ---------- ---------- ----------------
MGA shared context root latch                                      94598513   28088359                0

SQL> select "WHERE",NWFAIL_COUNT, SLEEP_COUNT, WTR_SLP_COUNT, LONGHOLD_COUNT  from v$latch_misses where PARENT_NAME='MGA shared context root latch' AND SLEEP_COUNT >0 order by 4 desc;

WHERE                                                        NWFAIL_COUNT SLEEP_COUNT WTR_SLP_COUNT LONGHOLD_COUNT
------------------------------------------------------------ ------------ ----------- ------------- --------------
ksm_mga_detach_whr1                                                     0    26337704      27097864              0
ksm_mga_attach_whr1                                                     0    25880704      25116290              0
ksm_mga_seg_cnt_whr1                                                    0         347         53066            253
ksm_mga_create_whr1                                                     0        1604          5847             14

Note:

当前看主要发生在shared context root latch 上,也可能root latch就单个,所以争用较为集中,如发生成进程初始化阶段,后期版本有可能增加并行,从where的2个函数猜测 ksm_mga_detach_whr1应该是在session logout, 那ksm_mga_attach_whr1 应该是session logon.  那就有可能是连接风暴导致,看一下登录

Snap Id Snap Time Sessions Cursors/Session Instances
Begin Snap: 471 12-Jul-21 09:00:07 16405 3.3 2
End Snap: 472 12-Jul-21 10:00:24 16555 3.7 2
Elapsed: 60.29 (mins)
DB Time: 2,234.57 (mins)

Load Profile

Per Second Per Transaction Per Exec Per Call
DB Time(s): 37.1 0.0 0.00 0.00
DB CPU(s): 23.3 0.0 0.00 0.00
Background CPU(s): 1.0 0.0 0.00 0.00
SQL Work Area (MB): 151.2 0.1
Logons: 44.8 0.0
User logons: 44.5 0.0
Executes (SQL): 34,217.2 32.5
Rollbacks: 271.7 0.3
Transactions: 1,052.2

Note:
当前这个库的会话数较高约达17k, 同时每秒连接LOGON为40也不低, 目前也就这两个指标相比其它同版本环境较高,且只有这个节点问题较为突出, 判断与较高的短连接有关,我们在其它环境初始化了15K会话数,再以40每秒的短连接测试,也可以测试出latch: MGA shared context root latch 快速增加,但未能模拟出像生产相同的现象。

— UPDATE 2022/02/09 —
在oracle 19.12中已定义为 Bug 33352794  HIGH WAIT ON ‘LATCH: MGA SHARED CONTEXT ROOT LATCH’ AND ‘LATCH: MGA SHARED CONTEXT LATCH’,   并在19.13 RU中修复.

AWR reports show waits for MGA latch contention.
If  high wait on 'latch: MGA shared context root latch' and 'latch: MGA shared
context latch' is noticed this  bug might be hit.

 

打赏

,

对不起,这篇文章暂时关闭评论。