Goldengate Integrated Extract集成模式抽取挂起?
最近一套ogg环境反馈做了些磁盘维护后抽取进程abended, 查看ggs日志发现有提示OGG-00662 OCI ERROR ora-03113 ora-03115和OGG-02078 extract encountered a fatal error错误,启动后使用ggsci info ext xx查看RBA无变化,简单记录一下该案例。
什么是Goldengate Integrated Extract
Oracle 从 OGG 11.2.x 引入了 Integrated Extract 或 Integrated Capture。它们将此捕获与 Oracle RDBMS 绑定,因此称为集成捕获。相对应的Classic Capture 与 Oracle 数据库分离,而 Integrated Capture 与 Oracle 数据库紧密耦合。在这里,不是直接从数据库中捕获事务的提取过程,而是 Logmining Server 扮演的角色。LogMining Server将与数据库中的此集成捕获过程一起附加,以挖掘redo log或archivelog并捕获事务。
集成模式的抽取有以下子进程:
READER – 读取日志文件并拆分为区域
PREPARER – 根据提取参数扫描日志文件和预过滤器的区域
BUILDER – 按 SCN 顺序合并准备好的记录
CAPTURE – 格式化逻辑变更记录 (LCR) 并传递给 Oracle GoldenGate Extract
Oracle GoldenGate 支持所有 Oracle 数据和存储类型。但要做到这一点,源数据库应至少为 11.2.0.3,低的版本需要安装一些补丁。集成捕获此过程主要针对流概念。会占用或消耗STREAMS_POOL_SIZE内存.
案例
因为日志中看到了ora-3113 ora-3115怀疑是数据库实例发生了重启,从数据库v$instance.START_TIME确认DB实例确实刚刚重启。
ggsci> info xx , detail
Integrated Extract outbound server first scn: Unavailable.Need DBLOGIN.
Integrated Extract outbound server filtering start scn: Unavailable.Need DBLOGIN.
ggsci> DBLOGIN xxx ggsci> stop mgr ggsci> start mgr ggsci> send extract xx , forcestop ggsci> start extract xx ggsci> view report xx 2023-12-19 18:14:07 INFO OGG-01639 BOUNDED RECOVERY: ACTIVE: for object pool 1: p93305_extr. 2023-12-19 18:14:07 INFO OGG-01640 BOUNDED RECOVERY: recovery start XID: 1.45.58.16824898. 2023-12-19 18:14:07 INFO OGG-01641 BOUNDED RECOVERY: recovery start position: SeqNo: 717484, RBA: 1394662416, SCN: 3254.3031471228 (13978855052412), Timestamp: 2023-12-19 14:54:13.000000. 2023-12-19 18:14:07 INFO OGG-01642 BOUNDED RECOVERY: recovery end position: SeqNo: 916180, RBA: 405960548, SCN: 3254.3050473505 (13978874054689), Timestamp: 2023-12-19 15:45:22.000000, Thread: 1. 2023-12-19 18:14:07 INFO OGG-01579 BOUNDED RECOVERY: VALID BCP: CP.EX_OL.000016487. 2023-12-19 18:14:07 INFO OGG-01629 BOUNDED RECOVERY: PERSISTED OBJECTS RECOVERED: 21. Database Version: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production PL/SQL Release 11.2.0.4.0 - Production GGSCI (weejar1 as ggs@weejar1) 143> info ext_anb, detail EXTRACT ext_anb Last Started 2023-12-19 18:07 Status RUNNING Checkpoint Lag 01:01:49 (updated 00:16:59 ago) Process ID 21074 Log Read Checkpoint Oracle Integrated Redo Logs 2023-12-19 17:05:40 SCN 3254.3079387801 (13978902968985) GGSCI (weejar1 as ggs@weejar1) 151> info ext_anb, showch EXTRACT ext_anb Last Started 2023-12-19 18:07 Status RUNNING Checkpoint Lag 01:29:49 (updated 00:00:02 ago) Process ID 21074 Log Read Checkpoint Oracle Integrated Redo Logs 2023-12-19 17:05:40 SCN 3254.3079387801 (13978902968985) Current Checkpoint Detail: Read Checkpoint #1 Oracle Integrated Redo Log Startup Checkpoint (starting position in the data source): Timestamp: 2020-03-09 16:19:45.000000 SCN: Not available Recovery Checkpoint (position of oldest unprocessed transaction in the data source): Timestamp: 2023-04-07 16:45:19.000000 SCN: 3219.225976818 (13825725702642) Current Checkpoint (position of last record read in the data source): Timestamp: 2023-12-19 17:05:40.000000 SCN: 3254.3079387801 (13978902968985) BR Previous Recovery Checkpoint: Timestamp: 2023-12-13 12:33:41.000000 SCN: 3253.4196616217 (13975725230105) BR Begin Recovery Checkpoint: Timestamp: 2023-12-19 14:54:13.000000 SCN: 3254.3031471228 (13978855052412) BR End Recovery Checkpoint: Timestamp: 2023-12-19 15:46:04.000000 SCN: 3254.3050736660 (13978874317844) Write Checkpoint #1 GGS Log Trail Current Checkpoint (current write position): Sequence #: 109303 RBA: 43508204 Timestamp: 2023-12-19 17:05:42.152798 Extract Trail: ./dirdat/yms/ar Seqno Length: 9 Flip Seqno Length: No Trail Type: EXTTRAIL Header: Version = 2 Record Source = A Type = 13 # Input Checkpoints = 1 # Output Checkpoints = 1 File Information: Block Size = 2048 Max Blocks = 100 Record Length = 2048 Current Offset = 0 Configuration: Data Source = 3 Transaction Integrity = 1 Task Type = 0 Status: Start Time = 2023-12-19 18:07:56 Last Update Time = 2023-12-19 18:35:29 Stop Status = A Last Result = 520 GGSCI (weejar1 as ggs@weejar1) 152> info ext_anb, showch EXTRACT ext_anb Last Started 2023-12-19 18:07 Status RUNNING Checkpoint Lag 01:31:10 (updated 00:00:01 ago) Process ID 21074 Log Read Checkpoint Oracle Integrated Redo Logs 2023-12-19 17:05:40 SCN 3254.3079387801 (13978902968985) Current Checkpoint Detail: Read Checkpoint #1 Oracle Integrated Redo Log Startup Checkpoint (starting position in the data source): Timestamp: 2020-03-09 16:19:45.000000 SCN: Not available Recovery Checkpoint (position of oldest unprocessed transaction in the data source): Timestamp: 2023-04-07 16:45:19.000000 SCN: 3219.225976818 (13825725702642) Current Checkpoint (position of last record read in the data source): Timestamp: 2023-12-19 17:05:40.000000 SCN: 3254.3079387801 (13978902968985) BR Previous Recovery Checkpoint: Timestamp: 2023-12-13 12:33:41.000000 SCN: 3253.4196616217 (13975725230105) BR Begin Recovery Checkpoint: Timestamp: 2023-12-19 14:54:13.000000 SCN: 3254.3031471228 (13978855052412) BR End Recovery Checkpoint: Timestamp: 2023-12-19 15:46:04.000000 SCN: 3254.3050736660 (13978874317844) Write Checkpoint #1 GGS Log Trail Current Checkpoint (current write position): Sequence #: 109303 RBA: 43508204 Timestamp: 2023-12-19 17:05:42.152798 Extract Trail: ./dirdat/yms/ar Seqno Length: 9 Flip Seqno Length: No Trail Type: EXTTRAIL Header: Version = 2 Record Source = A Type = 13 # Input Checkpoints = 1 # Output Checkpoints = 1 File Information: Block Size = 2048 Max Blocks = 100 Record Length = 2048 Current Offset = 0 Configuration: Data Source = 3 Transaction Integrity = 1 Task Type = 0 Status: Start Time = 2023-12-19 18:07:56 Last Update Time = 2023-12-19 18:36:50 Stop Status = A Last Result = 520
Bounded Recovery是通用提取检查点工具的一个组件。它保证了在Extract因任何原因(计划内或计划外)停止后的有效恢复,无论Extract停止时有多少打开(未提交)事务,也无论它们有多长时间。当Extract遇到事务的提交记录时,它将整个缓存的事务写入到trail中,并将其从内存中清除。当Extract遇到事务的回滚记录时,它将从内存中丢弃整个事务。在Extract处理提交或回滚之前,事务被认为是打开的,并继续收集其信息。如果Extract在遇到事务的提交或回滚记录之前停止,则在Extract再次启动时必须恢复所有缓存的信息。这适用于Extract停止时打开的所有事务。<OGG(12.3) Extract long time lag after Oracle RAC a instance Crash>之前记录过一个故障 .
Extract执行如下恢复操作:
如果在Extract停止时没有打开的事务,则从当前Extract读取检查点开始恢复。这是一次正常的recovery。
如果存在打开的事务,其日志中的起始点在时间上与Extract停止的时间非常接近,则Extract将通过从最老的打开事务开始重新读取日志来开始恢复。这就要求Extract为那些在Extract停止之前已经写入trail或丢弃的事务执行冗余工作,但是考虑到要处理的数据量相对较少,这些工作是可以接受的成本。这也被认为是正常的recovery。
如果Extract有一个或多个事务符合长时间运行的打开事务的条件,则Extract使用 Bounded Recovery开始其recovery。
在每个Bounded Recovery interval,Extract生成一个Bounded Recovery checkpoint,它将Extract的当前状态和数据持久化到磁盘,包括长时间运行的事务的状态和数据,如果Extract在Bounded Recovery checkpoint后停止,它将从上一个Bounded Recovery interval的位置或最后一个Bounded Recovery checkpoint进行恢复,而不是从最早的打开的长时间运行事务首次出现的日志位置进行处理。 maximum Bounded Recovery time(Extract恢复到停止位置的最大时间)不超过当前Bounded Recovery checkpoint interval的两倍。
简而言之,如果一个长事务一直没有结束,而OGG又是实时解析已commited的事务,不会等未提交的事务,而如是长事务结束前ogg extract要关闭重启,cache里的信息就会被丢弃,下次启动时如果从事务最开始取解析redo或很可能是归档,就导致反复读,因为这些日志之前已提交的事务已经解析并写入trail文件,效率有点低,所以引入了BR, 通过BR interval 参数控制多长时间做一次checkpoint, 而checkpoint完成后,之前开始的长事务在下次重启时,就不再需要从之前的归档开始,甚至不再需要之前的归档,优先读取BR文件。回想第一次接触BR好像是在9年前,BR文件出现问题导致extract挂起。
查看Recover的位置
GGSCI (weejar1 as ggs@weejar1) 155> send ex_anb status Sending STATUS request to EXTRACT ex_anb ... EXTRACT ex_anb (PID 30373) Current status: In recovery[1]: Processing data Current read position: Redo thread #: 2 Sequence #: 717581 RBA: 186794744 Timestamp: 2023-12-19 16:30:56.000000 SCN: 3254.3066340389 (13978889921573) Current write position: Sequence #: 87327 RBA: 197816537 Timestamp: 2023-12-19 17:05:32.876775 Extract Trail: ./dirdat/yms/ol GGSCI (weejar1 as ggs@weejar1) 156> ! send ex_anb status Sending STATUS request to EXTRACT ex_anb ... EXTRACT ex_anb (PID 30373) Current status: In recovery[1]: Processing data Current read position: Redo thread #: 1 Sequence #: 916250 RBA: 102453108 Timestamp: 2023-12-19 16:33:02.000000 SCN: 3254.3066980486 (13978890561670) Current write position: Sequence #: 87327 RBA: 197816537 Timestamp: 2023-12-19 17:05:32.876775 Extract Trail: ./dirdat/yms/ol
Note:
可看read Timestamp和SCN在变化,等待恢复完成即可.
References
https://docs.oracle.com/en/middleware/goldengate/core/21.3/reference/br.html
对不起,这篇文章暂时关闭评论。