首页 » OGG, ORACLE 9i-23ai » OGG Integrated replicat process Abend with error Ora-4031 “streams pool” “apply shared t”,”commbuf_knasctx[0]”

OGG Integrated replicat process Abend with error Ora-4031 “streams pool” “apply shared t”,”commbuf_knasctx[0]”

在 Oracle GoldenGate 中使用集成模式时,STREAMS_POOL 起着至关重要的作用。集成进程从“STREAMS POOL”获取共享内存。STREAMS POOL 是 SGA 的内存组件之一。STREAMS_POOL_SIZE 的大小应根据数据库系统中使用的集成提取的数量来确定。我们还应该考虑在数据库中使用 STREAMS POOL 的其他进程。最近一个案例ogg异常报错,因ora-4031 streams pool不足,简单记录。

OGG 报错

 WARNING OGG-02531 Oracle GoldenGate Delivery for Oracle, ANBOB.prm: Inbound server OGG$ANBOB error status 
 ORA-600:ORA-00448: normal completion of background process
ERROR OGG-01668 Oracle GoldenGate Delivery for Oracle, ANBOB.prm: PROCESS ABENDING.

检查数据库异常

adrci> show incident

ADR Home = /u01/app/oracle/diag/rdbms/weejar/weejar1:
*************************************************************************
INCIDENT_ID          PROBLEM_KEY                                                 CREATE_TIME                              
-------------------- ----------------------------------------------------------- ---------------------------------------- 
...    
1394870              ORA 4031                                                    2025-01-09 02:47:25.876000 +08:00       
1392446              ORA 4031                                                    2025-01-09 02:48:21.884000 +08:00       
1391406              ORA 4031                                                    2025-01-09 02:50:50.468000 +08:00       
1391470              ORA 4031                                                    2025-01-09 02:50:50.646000 +08:00       
1390182              ORA 4031                                                    2025-01-09 02:52:32.182000 +08:00       
11 rows fetched

adrci> show incident -mode detail -p "incident_id=1394870"

ADR Home = /u01/app/oracle/diag/rdbms/weejar/weejar1:
*************************************************************************

**********************************************************
INCIDENT INFO RECORD 1
**********************************************************
   INCIDENT_ID                   1394870
   STATUS                        ready
   CREATE_TIME                   2025-01-09 02:47:25.876000 +08:00
   PROBLEM_ID                    1
   CLOSE_TIME                    
   FLOOD_CONTROLLED              none
   ERROR_FACILITY                ORA
   ERROR_NUMBER                  4031
   ERROR_ARG1                    50024
   ERROR_ARG2                    streams pool
   ERROR_ARG3                    unknown object
   ERROR_ARG4                    apply shared t
   ERROR_ARG5                    commbuf_knasctx[0]
   ERROR_ARG6                    
   ERROR_ARG7                    
   ERROR_ARG8                    
   ERROR_ARG9                    
   ERROR_ARG10                   
   ERROR_ARG11                   
   ERROR_ARG12                   
   SIGNALLING_COMPONENT          KGH
   SIGNALLING_SUBCOMPONENT       
   SUSPECT_COMPONENT             
   SUSPECT_SUBCOMPONENT          
   ECID                          
   IMPACTS                       0
   PROBLEM_KEY                   ORA 4031
   FIRST_INCIDENT                1005169
   FIRSTINC_TIME                 2020-03-23 00:45:49.181000 +08:00
   LAST_INCIDENT                 1390182
   LASTINC_TIME                  2025-01-09 02:52:32.182000 +08:00
   IMPACT1                       34668547
   IMPACT2                       34668546
   IMPACT3                       0
   IMPACT4                       0
   KEY_NAME                      Client ProcId
   KEY_VALUE                     oracle@weejar01.33229_140640389834560
   KEY_NAME                      SID
   KEY_VALUE                     4640.2837
   KEY_NAME                      ProcId
   KEY_VALUE                     745.21
   OWNER_ID                      1
   INCIDENT_FILE                 /u01/app/oracle/diag/rdbms/weejar/weejar1/incident/incdir_1394870/weejar1_as12_33229_i1394870.trc
   OWNER_ID                      1
   INCIDENT_FILE                 /u01/app/oracle/diag/rdbms/weejar/weejar1/trace/weejar1_as12_33229.trc

 adrci> show trace /u01/app/oracle/diag/rdbms/weejar/weejar1/incident/incdir_1394870/weejar1_as12_33229_i1394870.trc
Output the results to file: /tmp/utsout_442777_1404_2.ado

/u01/app/oracle/diag/rdbms/weejar/weejar1/incident/incdir_1394870/weejar1_as12_33229_i1394870.trc
----------------------------------------------------------
LEVEL PAYLOAD
----- ------------------------------------------------------------------------------------------------------------------------------------------------
      Dump file /u01/app/oracle/diag/rdbms/weejar/weejar1/incident/incdir_1394870/weejar1_as12_33229_i1394870.trc
      Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
      With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
      Data Mining and Real Application Testing options
      ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1
      System name:      Linux
      Node name:        weejar01
      Release:  3.10.0-957.21.3.el7.x86_64
      Version:  #1 SMP Fri Jun 14 02:54:29 EDT 2019
      Machine:  x86_64
      Instance name: weejar1
      Redo thread mounted by this instance: 1
      Oracle process number: 745
      Unix process pid: 33229, image: oracle@weejar01 (AS12)


      *** 2025-01-09 02:47:25.904
      *** SESSION ID:(4640.2837) 2025-01-09 02:47:25.904
      *** CLIENT ID:() 2025-01-09 02:47:25.904
      *** SERVICE NAME:(SYS$USERS) 2025-01-09 02:47:25.904
      *** MODULE NAME:() 2025-01-09 02:47:25.904
      *** ACTION NAME:() 2025-01-09 02:47:25.904

      Dump continued from file: /u01/app/oracle/diag/rdbms/weejar/weejar1/trace/weejar1_as12_33229.trc
1>     ***** Error Stack *****
       ORA-04031: unable to allocate 50024 bytes of shared memory ("streams pool","unknown object","apply shared t","commbuf_knasctx[0]")
1< ***** Error Stack ***** 1>     ***** Dump for incident 1394870 (ORA 4031) *****

       *** 2025-01-09 02:47:25.905
       dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0)
2>      ***** SQL Statement (None) *****
        Current SQL information unavailable - no cursor.
2< ***** current_sql_statement ***** 2>      ***** Call Stack Trace *****
        calling              call     entry                argument values in hex
        location             type     point                (? means dubious value)
        -------------------- -------- -------------------- ----------------------------

        *** 2025-01-09 02:47:26.043
...
        dbkePostKGE_kgsf()+  call     dbgePostErrorKGE()   00C2F3B40 ? 7FE96476B910 ?
        71                                                 000000FBF ? 000000001 ?
                                                           000000000 ? 000000002 ?
        kgesev()+280         call     dbkePostKGE_kgsf()   00C2F3B40 ? 7FE96476B910 ?
                                                           000000FBF ? 000000001 ?
                                                           000000000 ? 000000002 ?
        kgesec5()+140        call     kgesev()             00C2F3B40 ? 7FE96476B910 ?
                                                           000000FBF ? 000000005 ?
                                                           7FFFC3A608F0 ? 000000002 ?
        kghnospc()+2316      call     kgesec5()            00C2F3B40 ? 7FE96476B910 ?
                                                           000000FBF ? 000000000 ?
                                                           00000C368 ? 000000001 ?
        kghalo()+1956        call     kghnospc()           00C2F3B40 ? 20E0C74A30 ?
                                                           00000C368 ? 000000000 ?
                                                           00000C368 ? 000000001 ?
        knasinit()+736       call     kghalo()             00C2F3B40 ? 20E0C74A30 ?
                                                           000000000 ? 00000C368 ?
                                                           00000C368 ? 060004CE8 ?
        knasm2()+941         call     knasinit()           7FE96116CF38 ? 1DF7403820 ?

Note:
问题时间有ORA-4031错误

Kghnospc ==>kernel generic heap manager no space available in the heap, signal an error
Kghalo ==>Kernel generic heap manager main allocation entry point
Knasi ==>Kernel replication streams apply server init

什么是ASnn进程

ASnn Database Apply Reader or Apply Server Computes dependencies between logical change records (LCRs) and assembles messages into transactions (Reader Server)

Applies LCRs to database objects or passes LCRs and user messages to their appropriate apply handlers (Apply Server)

When the reader server finishes computing dependencies between LCRs and assembling transactions, it returns the assembled transactions to the coordinator process. Query V$STREAMS_APPLY_READER, V$XSTREAM_APPLY_READER, and V$GG_APPLY_READER for information about the reader server background process.

An apply server receives the transactions from the coordinator background process, and either applies database changes in LCRs or sends LCRs or messages to apply handlers. Apply servers can also enqueue a queue. If an apply server encounters an error, then it then tries to resolve the error with a user-specified conflict handler or error handler. If an apply server cannot resolve an error, then it rolls back the transaction and places the entire transaction, including all of its messages, in the error queue. When an apply server commits a completed transaction, this transaction has been applied. When an apply server places a transaction in the error queue and commits, this transaction also has been applied. Query V$STREAMS_APPLY_SERVER for information about the apply server background process. For XStream Inbound servers, query V$XSTREAM_APPLY_SERVER. For GoldenGate Integrated Replicat, query V$GG_APPLY_SERVER.

The coordinator process name is ASnn, where nn can include letters and numbers.

Database instances, XStream Outbound servers, XStream Inbound servers, GoldenGate Integrated Replic

检查Streams Pool 大小

SQL> show parameter sga

NAME                                 TYPE                   VALUE
------------------------------------ ---------------------- ------------------------------
_memory_imm_mode_without_autosga     boolean                FALSE
lock_sga                             boolean                FALSE
pre_page_sga                         boolean                FALSE
sga_max_size                         big integer            160G
sga_target                           big integer            0
SQL> show parameter target

NAME                                 TYPE                   VALUE
------------------------------------ ---------------------- ------------------------------
archive_lag_target                   integer                0
db_flashback_retention_target        integer                1440
fast_start_io_target                 integer                0
fast_start_mttr_target               integer                0
memory_max_target                    big integer            0
memory_target                        big integer            0
parallel_servers_target              integer                768
pga_aggregate_target                 big integer            30G
sga_target                           big integer            0
SQL> 

QL> select * from v$sgainfo

NAME                                                                       BYTES RESIZE
---------------------------------------------------------------- --------------- ------
Fixed SGA Size                                                           2262656 No
Redo Buffers                                                           304508928 No
Buffer Cache Size                                                   118111600640 Yes
Shared Pool Size                                                     19327352832 Yes
Large Pool Size                                                       3758096384 Yes
Java Pool Size                                                        3758096384 Yes
Streams Pool Size                                                    21474836480 Yes
Shared IO Pool Size                                                            0 Yes
Granule Size                                                           536870912 No
Maximum SGA Size                                                    171031724032 No
Startup overhead in Shared Pool                                       9093687064 No

NAME                                                                       BYTES RESIZE
---------------------------------------------------------------- --------------- ------
Free SGA Memory Available                                             4294967296

Note:
SGA手动管理,当前streams pool 在20G左右

检查4031 trace 中的streams pool最大内存区

     ==============================================
      TOP 10 MEMORY USES FOR STREAMS POOL
      ----------------------------------------------
      "apply shared t            "         18 GB 88%
      "free memory               "       2401 MB 12%
      "apply shared transactions "        116 KB  0%
      "recov_kgqbtctx            "        110 KB  0%
      "Sender info               "         60 KB  0%
      "kwqbcqini:spilledovermsgs "         29 KB  0%
      "kwqbsinfy:bqg             "         13 KB  0%
      "recov_kgqmctx             "         10 KB  0%
      "kwqbsinfy:mpr             "          5 KB  0%
      "kwqbsinfy:sta             "          5 KB  0%
      TOTALS ---------------------------------------
      Total free memory                  2401 MB
      Total memory alloc.                  18 GB
      Grand total                          20 GB
      ==============================================
      Memory Utilization of Subpool 1 (STREAMS POOL)
      ==============================================
            Allocation Name            Size
      ___________________________  ____________
      "free memory              "    2517842720
      "miscellaneous            "             0
      "so_knlsga                "          3800
      "kwqbcqini:spilledovermsgs"         30096
      "deqtree_kgqmctx          "          1368
      "time manager index       "          1368
      "qulptr_kwqbscc           "             0
      "apply shared t           "   18956720952
      "substree_kgqmctx         "          1368
      "spilled:kwqbl            "          2944
      "recov_kgqbtctx           "        113088
      "sob_kgqmrec              "             0
      "recov_kgqmctx            "         10488
      "recov_kgqmsub            "             0
      "Sender info              "         61656
      "kwqpepseq                "             0
      "knlso:so                 "             0
      "kgqmsub                  "             0
      "name_kgqmsub             "             0
      "msgtree_kgqmctx          "          1368
      "apply shared transactions"        119016
      "heap_knlvpso             "             0
      "fixed allocation callback"           448
      "Free sender info segarray"             0
      "image handles            "          1584
      "kgqbt_alloc_block        "             0
      "kodpaih3 image           "             0
      "kwqbsinfy:bqg            "         14184
      "kwqbsinfy:sta            "          5376
      "kwqbsinfy:cco            "           376
      "kwqbsinfy:bms            "          1408
      "kwqbsinfy:mpr            "          5984
      "kwqbcfsg-fr1             "             0
      "kgqmdm_fl_1              "           792
      ==============================================
      Memory Utilization of Subpool 2 (STREAMS POOL)
      ==============================================
            Allocation Name            Size
      ___________________________  ____________
      ==============================================
      Memory Utilization of Subpool 3 (STREAMS POOL)
      ==============================================
            Allocation Name            Size
      ___________________________  ____________
      ==============================================
      Memory Utilization of Subpool 4 (STREAMS POOL)
      ==============================================
            Allocation Name            Size
      ___________________________  ____________
      ==============================================
      Memory Utilization of Subpool 5 (STREAMS POOL)
      ==============================================
            Allocation Name            Size
      ___________________________  ____________
      ==============================================
      Memory Utilization of Subpool 6 (STREAMS POOL)
      ==============================================
            Allocation Name            Size
      ___________________________  ____________
      ==============================================
      Memory Utilization of Subpool 7 (STREAMS POOL)
      ==============================================
            Allocation Name            Size				

note:
仅有一个subpool, 且近90%为”apply shared t”. 如果没有生成或手动输出heap使用,可以使用如下heapdump.

Heap “apply shared t” Keep On Increasing Which Configured Goldengate Integrated Process (Doc ID 2995731.1)

 connect / as sysdba
 oradebug setmypid
 oradebug unlimit
 oradebug dump heapdump 536870976
 oradebug tracefile_name
 oradebug close_trace

SGA breakdown difference

  • ordered by Pool, Name
  • N/A value for Begin MB or End MB indicates the size of that Pool/Name was insignificant, or zero in that snapshot
Pool Name Begin MB End MB % Diff
java free memory 3,584.00 3,584.00 0.00
large free memory 3,575.81 3,575.81 0.00
shared ASH buffers 192.00 192.00 0.00
shared ASM extent pointer array 416.41 416.35 -0.02
shared Checkpoint queue 468.80 468.80 0.00
shared FileOpenBlock 486.40 486.40 0.00
shared KGLH0 2,768.91 2,744.05 -0.90
shared KGLHD 457.91 452.98 -1.08
shared SQLA 3,407.29 3,410.39 0.09
shared db_block_hash_buckets 712.00 712.00 0.00
shared free memory 996.88 1,010.43 1.36
shared gc name table 480.00 480.00 0.00
shared gcs resources 3,167.77 3,167.77 0.00
shared gcs shadows 2,193.07 2,193.07 0.00
shared ges resource 189.20 189.20 0.00
shared kglsim object batch 267.25 267.25 0.00
streams apply shared t 18,138.56 15,051.64 -17.02
streams free memory 2,341.18 5,428.10 131.85
buffer_cache 112,640.00 112,640.00 0.00
fixed_sga 2.16 2.16 0.00
log_buffer 290.40 290.40 0.00

Note:
问题时apply shared t也释放出来近3G,说明内存还并非一直增加,也有可能是因为进程的退出。

问题时间DB Alert 报错

WARNING: inbound connection timed out (ORA-3136)
Thu Jan 09 02:47:25 2025
Errors in file /u01/app/oracle/diag/rdbms/weejar/weejar1/trace/weejar1_as12_33229.trc  (incident=1394870):
ORA-04031: unable to allocate 50024 bytes of shared memory ("streams pool","unknown object","apply shared t","commbuf_knasctx[0]")
Incident details in: /u01/app/oracle/diag/rdbms/weejar/weejar1/incident/incdir_1394870/weejar1_as12_33229_i1394870.trc

虽然还有2G free,可能是存在一些碎片。该库的大事务比较多。从当时的AWR 后台进程wait 判断

Background Wait Events

Event Waits %Time -outs Total Wait Time (s) Avg wait (ms) Waits /txn % bg time
Streams apply: waiting for dependency 769,467 36 4,689 6 0.22 25.57

如果出现 WAIT DEPENDENCY wait event.说明存在较大大事务(more than 999 lcrs/tx)。或是应用的表上有外键无索引。

可以使用 EAGER_SIZE 参数调整 Integrated Replicat 性能
如果事务更改了表中超过 15100 行,Goldengate 会认为该事务很大(在 12.2 版中已更改。在早期版本中,该值是 9500),一个重要的参数强制 Goldengate 如何应用这些“大型”事务。它被称为 EAGER_SIZE。本质上对于 Oracle GoldenGate 而言,这意味着当我在事务中看到大量 LCR 时,我是否立即开始应用它们(我猜这就是参数名称的“eager”部分派生的地方)还是等待整个事务提交然后才开始应用更改。这种“等待”似乎使应用过程序列化,并在很大程度上增加了目标的应用滞后。
如由于最大事务的大小现在是 20,000 行,我们需要将 EAGER_SIZE 设置为比该更高的值。请注意,增加 EAGER_SIZE 会对 STREAMS_POOL_SIZE 提出额外的内存要求。

 DBOPTIONS INTEGRATEDPARAMS(parallelism  8,EAGER_SIZE 25000)

解决方法

该环境的OGG是18.1, 数据库为11.2.0.4,有近20个IR 应用进程组。  当前的现象未发现匹配已知BUG. 基本原因是因为数据库的Streams Pool 耗尽,出现了ORA-4031 错误,影响了因为使用Stream的进程异常,甚至在创建进程时失败。如上面日志中的knasinit函数调用。

大多数情况下,如果在streams pool中出现 ORA-04031 错误,这表明该池对于工作负载来说太小了。发生在当大的事务时,没有足够的内存申请。

方案一,增加stream pool大小 (推荐)
正常情况下

 stream_pool_size =  (1.5 GB * # of integrated Replicats) + 25% head room

如20个进程进程
(1.5 GB * 20) * 1.25 = 37.5 GB    STREAMS_POOL_SIZE =38  G

方案二,增加stream pool 使用上限参数, 降低MAX_SGA_SIZE使用

集成Replicat的Streams池的大小需求基于一个参数MAX_SGA_SIZE。MAX_SGA_SIZE参数默认为INFINITE,允许Replicat进程使用尽可能多的Streams池。Oracle不建议设置MAX_SGA_SIZE参数。 单位MB, Replicat参数文件中的max_sga_size指的是要从streams_pool_size中分配的内存部分, 不是DB instance的SGA.

默认情况下,每个集成提取过程都要求日志挖掘服务器以 1-2G 的 MAX_SGA_SIZE 运行。在配置集成提取时,我们使用以下参数.

DBOPTIONS INTEGRATEDPARAMS(MAX_SGA_SIZE 1024)

在 GoldenGate 集成模式环境中调整 STREAMS POOL SIZE 的大小时,我们需要考虑三种情况:

  • 如果STREAMS_POOL_SIZE大于1G,则max_sga_size等于1G,否则将自动占用STREAMS_POOL_SIZE的75%。
  • 如果未设置 STEAMS_POOL_SIZE,则 MAX_SGA_SIZE(流大小)将占用 SHARED_POOL_SIZE 的 10%,最大为 1 GB。
  • 因此,我们需要考虑 SHARED POOL 的大小。如果您正在使用动态 SGA,则可以通过查询视图 V$SGA_DYNAMIC_COMPONENTS 获取 STREAMS POOL 的当前利用率。使用此视图,您可以了解 SGA 所有内存组件的当前使用大小。

下调参数可能会影响ogg apply变慢,但是可以控制错误的发生.

方案三,改用parallel 非集成模式应用
OGG 18支持parallel 非集成模式,在parallel 非集成模式不使用streams pool. 而是拆成多进进程并行,以SQL形式运行。

但是需要IR进程重建。

Known issue:
Bug 25209912 ogg-db: Integrated Parallel Replicat repeatedly abends with batch txns due to ORA-4031-streams pool leak –ogg 12.2
Bug 8857940 Enhancement to group durations to help reduce chance of ORA-4031 — db 11.2 duration
Starting OGG Integrated Replicat Leads To Database Crashes With ORA-04031: unable to allocate 3280 bytes of shared memory (“streams pool”,”unknown object”,”streams pool”,”sob_kgqmrec”) (Doc ID 2432457.1) Replicat was configured with max_sga_size value greater than the database SGA size
GoldenGate Integrated REPLICAT appears to be hung: BATCHSQL with large “apply shared t” in the Streams Pool (Doc ID 1985152.1)
Bug 26354017 exponantial growth apply shared t component — ogg 12.1.2

Heap “apply shared t” Keep On Increasing Which Configured Goldengate Integrated Process (Doc ID 2995731.1) ogg 19.1.0.0.0 and later

— over —

打赏

, ,

目前这篇文章还没有评论(Rss)

我要评论