OGG Integrated replicat process Abend with error Ora-4031 “streams pool” “apply shared t”,”commbuf_knasctx[0]”
在 Oracle GoldenGate 中使用集成模式时,STREAMS_POOL 起着至关重要的作用。集成进程从“STREAMS POOL”获取共享内存。STREAMS POOL 是 SGA 的内存组件之一。STREAMS_POOL_SIZE 的大小应根据数据库系统中使用的集成提取的数量来确定。我们还应该考虑在数据库中使用 STREAMS POOL 的其他进程。最近一个案例ogg异常报错,因ora-4031 streams pool不足,简单记录。
OGG 报错
WARNING OGG-02531 Oracle GoldenGate Delivery for Oracle, ANBOB.prm: Inbound server OGG$ANBOB error status ORA-600:ORA-00448: normal completion of background process ERROR OGG-01668 Oracle GoldenGate Delivery for Oracle, ANBOB.prm: PROCESS ABENDING.
检查数据库异常
adrci> show incident ADR Home = /u01/app/oracle/diag/rdbms/weejar/weejar1: ************************************************************************* INCIDENT_ID PROBLEM_KEY CREATE_TIME -------------------- ----------------------------------------------------------- ---------------------------------------- ... 1394870 ORA 4031 2025-01-09 02:47:25.876000 +08:00 1392446 ORA 4031 2025-01-09 02:48:21.884000 +08:00 1391406 ORA 4031 2025-01-09 02:50:50.468000 +08:00 1391470 ORA 4031 2025-01-09 02:50:50.646000 +08:00 1390182 ORA 4031 2025-01-09 02:52:32.182000 +08:00 11 rows fetched adrci> show incident -mode detail -p "incident_id=1394870" ADR Home = /u01/app/oracle/diag/rdbms/weejar/weejar1: ************************************************************************* ********************************************************** INCIDENT INFO RECORD 1 ********************************************************** INCIDENT_ID 1394870 STATUS ready CREATE_TIME 2025-01-09 02:47:25.876000 +08:00 PROBLEM_ID 1 CLOSE_TIME FLOOD_CONTROLLED none ERROR_FACILITY ORA ERROR_NUMBER 4031 ERROR_ARG1 50024 ERROR_ARG2 streams pool ERROR_ARG3 unknown object ERROR_ARG4 apply shared t ERROR_ARG5 commbuf_knasctx[0] ERROR_ARG6 ERROR_ARG7 ERROR_ARG8 ERROR_ARG9 ERROR_ARG10 ERROR_ARG11 ERROR_ARG12 SIGNALLING_COMPONENT KGH SIGNALLING_SUBCOMPONENT SUSPECT_COMPONENT SUSPECT_SUBCOMPONENT ECID IMPACTS 0 PROBLEM_KEY ORA 4031 FIRST_INCIDENT 1005169 FIRSTINC_TIME 2020-03-23 00:45:49.181000 +08:00 LAST_INCIDENT 1390182 LASTINC_TIME 2025-01-09 02:52:32.182000 +08:00 IMPACT1 34668547 IMPACT2 34668546 IMPACT3 0 IMPACT4 0 KEY_NAME Client ProcId KEY_VALUE oracle@weejar01.33229_140640389834560 KEY_NAME SID KEY_VALUE 4640.2837 KEY_NAME ProcId KEY_VALUE 745.21 OWNER_ID 1 INCIDENT_FILE /u01/app/oracle/diag/rdbms/weejar/weejar1/incident/incdir_1394870/weejar1_as12_33229_i1394870.trc OWNER_ID 1 INCIDENT_FILE /u01/app/oracle/diag/rdbms/weejar/weejar1/trace/weejar1_as12_33229.trc adrci> show trace /u01/app/oracle/diag/rdbms/weejar/weejar1/incident/incdir_1394870/weejar1_as12_33229_i1394870.trc Output the results to file: /tmp/utsout_442777_1404_2.ado /u01/app/oracle/diag/rdbms/weejar/weejar1/incident/incdir_1394870/weejar1_as12_33229_i1394870.trc ---------------------------------------------------------- LEVEL PAYLOAD ----- ------------------------------------------------------------------------------------------------------------------------------------------------ Dump file /u01/app/oracle/diag/rdbms/weejar/weejar1/incident/incdir_1394870/weejar1_as12_33229_i1394870.trc Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /u01/app/oracle/product/11.2.0/dbhome_1 System name: Linux Node name: weejar01 Release: 3.10.0-957.21.3.el7.x86_64 Version: #1 SMP Fri Jun 14 02:54:29 EDT 2019 Machine: x86_64 Instance name: weejar1 Redo thread mounted by this instance: 1 Oracle process number: 745 Unix process pid: 33229, image: oracle@weejar01 (AS12) *** 2025-01-09 02:47:25.904 *** SESSION ID:(4640.2837) 2025-01-09 02:47:25.904 *** CLIENT ID:() 2025-01-09 02:47:25.904 *** SERVICE NAME:(SYS$USERS) 2025-01-09 02:47:25.904 *** MODULE NAME:() 2025-01-09 02:47:25.904 *** ACTION NAME:() 2025-01-09 02:47:25.904 Dump continued from file: /u01/app/oracle/diag/rdbms/weejar/weejar1/trace/weejar1_as12_33229.trc 1> ***** Error Stack ***** ORA-04031: unable to allocate 50024 bytes of shared memory ("streams pool","unknown object","apply shared t","commbuf_knasctx[0]") 1< ***** Error Stack ***** 1> ***** Dump for incident 1394870 (ORA 4031) ***** *** 2025-01-09 02:47:25.905 dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0) 2> ***** SQL Statement (None) ***** Current SQL information unavailable - no cursor. 2< ***** current_sql_statement ***** 2> ***** Call Stack Trace ***** calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- *** 2025-01-09 02:47:26.043 ... dbkePostKGE_kgsf()+ call dbgePostErrorKGE() 00C2F3B40 ? 7FE96476B910 ? 71 000000FBF ? 000000001 ? 000000000 ? 000000002 ? kgesev()+280 call dbkePostKGE_kgsf() 00C2F3B40 ? 7FE96476B910 ? 000000FBF ? 000000001 ? 000000000 ? 000000002 ? kgesec5()+140 call kgesev() 00C2F3B40 ? 7FE96476B910 ? 000000FBF ? 000000005 ? 7FFFC3A608F0 ? 000000002 ? kghnospc()+2316 call kgesec5() 00C2F3B40 ? 7FE96476B910 ? 000000FBF ? 000000000 ? 00000C368 ? 000000001 ? kghalo()+1956 call kghnospc() 00C2F3B40 ? 20E0C74A30 ? 00000C368 ? 000000000 ? 00000C368 ? 000000001 ? knasinit()+736 call kghalo() 00C2F3B40 ? 20E0C74A30 ? 000000000 ? 00000C368 ? 00000C368 ? 060004CE8 ? knasm2()+941 call knasinit() 7FE96116CF38 ? 1DF7403820 ?
Note:
问题时间有ORA-4031错误
Kghnospc ==>kernel generic heap manager no space available in the heap, signal an error
Kghalo ==>Kernel generic heap manager main allocation entry point
Knasi ==>Kernel replication streams apply server init
什么是ASnn进程
ASnn | Database Apply Reader or Apply Server | Computes dependencies between logical change records (LCRs) and assembles messages into transactions (Reader Server)
Applies LCRs to database objects or passes LCRs and user messages to their appropriate apply handlers (Apply Server) |
When the reader server finishes computing dependencies between LCRs and assembling transactions, it returns the assembled transactions to the coordinator process. Query V$STREAMS_APPLY_READER, V$XSTREAM_APPLY_READER, and V$GG_APPLY_READER for information about the reader server background process.
An apply server receives the transactions from the coordinator background process, and either applies database changes in LCRs or sends LCRs or messages to apply handlers. Apply servers can also enqueue a queue. If an apply server encounters an error, then it then tries to resolve the error with a user-specified conflict handler or error handler. If an apply server cannot resolve an error, then it rolls back the transaction and places the entire transaction, including all of its messages, in the error queue. When an apply server commits a completed transaction, this transaction has been applied. When an apply server places a transaction in the error queue and commits, this transaction also has been applied. Query V$STREAMS_APPLY_SERVER for information about the apply server background process. For XStream Inbound servers, query V$XSTREAM_APPLY_SERVER. For GoldenGate Integrated Replicat, query V$GG_APPLY_SERVER. The coordinator process name is ASnn, where nn can include letters and numbers. |
Database instances, XStream Outbound servers, XStream Inbound servers, GoldenGate Integrated Replic |
检查Streams Pool 大小
SQL> show parameter sga NAME TYPE VALUE ------------------------------------ ---------------------- ------------------------------ _memory_imm_mode_without_autosga boolean FALSE lock_sga boolean FALSE pre_page_sga boolean FALSE sga_max_size big integer 160G sga_target big integer 0 SQL> show parameter target NAME TYPE VALUE ------------------------------------ ---------------------- ------------------------------ archive_lag_target integer 0 db_flashback_retention_target integer 1440 fast_start_io_target integer 0 fast_start_mttr_target integer 0 memory_max_target big integer 0 memory_target big integer 0 parallel_servers_target integer 768 pga_aggregate_target big integer 30G sga_target big integer 0 SQL> QL> select * from v$sgainfo NAME BYTES RESIZE ---------------------------------------------------------------- --------------- ------ Fixed SGA Size 2262656 No Redo Buffers 304508928 No Buffer Cache Size 118111600640 Yes Shared Pool Size 19327352832 Yes Large Pool Size 3758096384 Yes Java Pool Size 3758096384 Yes Streams Pool Size 21474836480 Yes Shared IO Pool Size 0 Yes Granule Size 536870912 No Maximum SGA Size 171031724032 No Startup overhead in Shared Pool 9093687064 No NAME BYTES RESIZE ---------------------------------------------------------------- --------------- ------ Free SGA Memory Available 4294967296
Note:
SGA手动管理,当前streams pool 在20G左右
检查4031 trace 中的streams pool最大内存区
==============================================
TOP 10 MEMORY USES FOR STREAMS POOL
----------------------------------------------
"apply shared t " 18 GB 88%
"free memory " 2401 MB 12%
"apply shared transactions " 116 KB 0%
"recov_kgqbtctx " 110 KB 0%
"Sender info " 60 KB 0%
"kwqbcqini:spilledovermsgs " 29 KB 0%
"kwqbsinfy:bqg " 13 KB 0%
"recov_kgqmctx " 10 KB 0%
"kwqbsinfy:mpr " 5 KB 0%
"kwqbsinfy:sta " 5 KB 0%
TOTALS ---------------------------------------
Total free memory 2401 MB
Total memory alloc. 18 GB
Grand total 20 GB
==============================================
Memory Utilization of Subpool 1 (STREAMS POOL)
==============================================
Allocation Name Size
___________________________ ____________
"free memory " 2517842720
"miscellaneous " 0
"so_knlsga " 3800
"kwqbcqini:spilledovermsgs" 30096
"deqtree_kgqmctx " 1368
"time manager index " 1368
"qulptr_kwqbscc " 0
"apply shared t " 18956720952
"substree_kgqmctx " 1368
"spilled:kwqbl " 2944
"recov_kgqbtctx " 113088
"sob_kgqmrec " 0
"recov_kgqmctx " 10488
"recov_kgqmsub " 0
"Sender info " 61656
"kwqpepseq " 0
"knlso:so " 0
"kgqmsub " 0
"name_kgqmsub " 0
"msgtree_kgqmctx " 1368
"apply shared transactions" 119016
"heap_knlvpso " 0
"fixed allocation callback" 448
"Free sender info segarray" 0
"image handles " 1584
"kgqbt_alloc_block " 0
"kodpaih3 image " 0
"kwqbsinfy:bqg " 14184
"kwqbsinfy:sta " 5376
"kwqbsinfy:cco " 376
"kwqbsinfy:bms " 1408
"kwqbsinfy:mpr " 5984
"kwqbcfsg-fr1 " 0
"kgqmdm_fl_1 " 792
==============================================
Memory Utilization of Subpool 2 (STREAMS POOL)
==============================================
Allocation Name Size
___________________________ ____________
==============================================
Memory Utilization of Subpool 3 (STREAMS POOL)
==============================================
Allocation Name Size
___________________________ ____________
==============================================
Memory Utilization of Subpool 4 (STREAMS POOL)
==============================================
Allocation Name Size
___________________________ ____________
==============================================
Memory Utilization of Subpool 5 (STREAMS POOL)
==============================================
Allocation Name Size
___________________________ ____________
==============================================
Memory Utilization of Subpool 6 (STREAMS POOL)
==============================================
Allocation Name Size
___________________________ ____________
==============================================
Memory Utilization of Subpool 7 (STREAMS POOL)
==============================================
Allocation Name Size
note:
仅有一个subpool, 且近90%为”apply shared t”. 如果没有生成或手动输出heap使用,可以使用如下heapdump.
Heap “apply shared t” Keep On Increasing Which Configured Goldengate Integrated Process (Doc ID 2995731.1)
connect / as sysdba oradebug setmypid oradebug unlimit oradebug dump heapdump 536870976 oradebug tracefile_name oradebug close_trace
SGA breakdown difference
- ordered by Pool, Name
- N/A value for Begin MB or End MB indicates the size of that Pool/Name was insignificant, or zero in that snapshot
Pool | Name | Begin MB | End MB | % Diff |
---|---|---|---|---|
java | free memory | 3,584.00 | 3,584.00 | 0.00 |
large | free memory | 3,575.81 | 3,575.81 | 0.00 |
shared | ASH buffers | 192.00 | 192.00 | 0.00 |
shared | ASM extent pointer array | 416.41 | 416.35 | -0.02 |
shared | Checkpoint queue | 468.80 | 468.80 | 0.00 |
shared | FileOpenBlock | 486.40 | 486.40 | 0.00 |
shared | KGLH0 | 2,768.91 | 2,744.05 | -0.90 |
shared | KGLHD | 457.91 | 452.98 | -1.08 |
shared | SQLA | 3,407.29 | 3,410.39 | 0.09 |
shared | db_block_hash_buckets | 712.00 | 712.00 | 0.00 |
shared | free memory | 996.88 | 1,010.43 | 1.36 |
shared | gc name table | 480.00 | 480.00 | 0.00 |
shared | gcs resources | 3,167.77 | 3,167.77 | 0.00 |
shared | gcs shadows | 2,193.07 | 2,193.07 | 0.00 |
shared | ges resource | 189.20 | 189.20 | 0.00 |
shared | kglsim object batch | 267.25 | 267.25 | 0.00 |
streams | apply shared t | 18,138.56 | 15,051.64 | -17.02 |
streams | free memory | 2,341.18 | 5,428.10 | 131.85 |
buffer_cache | 112,640.00 | 112,640.00 | 0.00 | |
fixed_sga | 2.16 | 2.16 | 0.00 | |
log_buffer | 290.40 | 290.40 | 0.00 |
Note:
问题时apply shared t也释放出来近3G,说明内存还并非一直增加,也有可能是因为进程的退出。
问题时间DB Alert 报错
WARNING: inbound connection timed out (ORA-3136) Thu Jan 09 02:47:25 2025 Errors in file /u01/app/oracle/diag/rdbms/weejar/weejar1/trace/weejar1_as12_33229.trc (incident=1394870): ORA-04031: unable to allocate 50024 bytes of shared memory ("streams pool","unknown object","apply shared t","commbuf_knasctx[0]") Incident details in: /u01/app/oracle/diag/rdbms/weejar/weejar1/incident/incdir_1394870/weejar1_as12_33229_i1394870.trc
虽然还有2G free,可能是存在一些碎片。该库的大事务比较多。从当时的AWR 后台进程wait 判断
Background Wait Events
Event | Waits | %Time -outs | Total Wait Time (s) | Avg wait (ms) | Waits /txn | % bg time |
Streams apply: waiting for dependency | 769,467 | 36 | 4,689 | 6 | 0.22 | 25.57 |
如果出现 WAIT DEPENDENCY wait event.说明存在较大大事务(more than 999 lcrs/tx)。或是应用的表上有外键无索引。
可以使用 EAGER_SIZE 参数调整 Integrated Replicat 性能
如果事务更改了表中超过 15100 行,Goldengate 会认为该事务很大(在 12.2 版中已更改。在早期版本中,该值是 9500),一个重要的参数强制 Goldengate 如何应用这些“大型”事务。它被称为 EAGER_SIZE。本质上对于 Oracle GoldenGate 而言,这意味着当我在事务中看到大量 LCR 时,我是否立即开始应用它们(我猜这就是参数名称的“eager”部分派生的地方)还是等待整个事务提交然后才开始应用更改。这种“等待”似乎使应用过程序列化,并在很大程度上增加了目标的应用滞后。
如由于最大事务的大小现在是 20,000 行,我们需要将 EAGER_SIZE 设置为比该更高的值。请注意,增加 EAGER_SIZE 会对 STREAMS_POOL_SIZE 提出额外的内存要求。
DBOPTIONS INTEGRATEDPARAMS(parallelism 8,EAGER_SIZE 25000)
解决方法
该环境的OGG是18.1, 数据库为11.2.0.4,有近20个IR 应用进程组。 当前的现象未发现匹配已知BUG. 基本原因是因为数据库的Streams Pool 耗尽,出现了ORA-4031 错误,影响了因为使用Stream的进程异常,甚至在创建进程时失败。如上面日志中的knasinit函数调用。
大多数情况下,如果在streams pool中出现 ORA-04031 错误,这表明该池对于工作负载来说太小了。发生在当大的事务时,没有足够的内存申请。
方案一,增加stream pool大小 (推荐)
正常情况下
stream_pool_size = (1.5 GB * # of integrated Replicats) + 25% head room
如20个进程进程
(1.5 GB * 20) * 1.25 = 37.5 GB STREAMS_POOL_SIZE =38 G
方案二,增加stream pool 使用上限参数, 降低MAX_SGA_SIZE使用
集成Replicat的Streams池的大小需求基于一个参数MAX_SGA_SIZE。MAX_SGA_SIZE参数默认为INFINITE,允许Replicat进程使用尽可能多的Streams池。Oracle不建议设置MAX_SGA_SIZE参数。 单位MB, Replicat参数文件中的max_sga_size指的是要从streams_pool_size中分配的内存部分, 不是DB instance的SGA.
默认情况下,每个集成提取过程都要求日志挖掘服务器以 1-2G 的 MAX_SGA_SIZE 运行。在配置集成提取时,我们使用以下参数.
DBOPTIONS INTEGRATEDPARAMS(MAX_SGA_SIZE 1024)
在 GoldenGate 集成模式环境中调整 STREAMS POOL SIZE 的大小时,我们需要考虑三种情况:
- 如果STREAMS_POOL_SIZE大于1G,则max_sga_size等于1G,否则将自动占用STREAMS_POOL_SIZE的75%。
- 如果未设置 STEAMS_POOL_SIZE,则 MAX_SGA_SIZE(流大小)将占用 SHARED_POOL_SIZE 的 10%,最大为 1 GB。
- 因此,我们需要考虑 SHARED POOL 的大小。如果您正在使用动态 SGA,则可以通过查询视图 V$SGA_DYNAMIC_COMPONENTS 获取 STREAMS POOL 的当前利用率。使用此视图,您可以了解 SGA 所有内存组件的当前使用大小。
下调参数可能会影响ogg apply变慢,但是可以控制错误的发生.
方案三,改用parallel 非集成模式应用
OGG 18支持parallel 非集成模式,在parallel 非集成模式不使用streams pool. 而是拆成多进进程并行,以SQL形式运行。
但是需要IR进程重建。
Known issue:
Bug 25209912 ogg-db: Integrated Parallel Replicat repeatedly abends with batch txns due to ORA-4031-streams pool leak –ogg 12.2
Bug 8857940 Enhancement to group durations to help reduce chance of ORA-4031 — db 11.2 duration
Starting OGG Integrated Replicat Leads To Database Crashes With ORA-04031: unable to allocate 3280 bytes of shared memory (“streams pool”,”unknown object”,”streams pool”,”sob_kgqmrec”) (Doc ID 2432457.1) Replicat was configured with max_sga_size value greater than the database SGA size
GoldenGate Integrated REPLICAT appears to be hung: BATCHSQL with large “apply shared t” in the Streams Pool (Doc ID 1985152.1)
Bug 26354017 exponantial growth apply shared t component — ogg 12.1.2
Heap “apply shared t” Keep On Increasing Which Configured Goldengate Integrated Process (Doc ID 2995731.1) ogg 19.1.0.0.0 and later
— over —
上一篇:
目前这篇文章还没有评论(Rss)