首页 » OGG » skip a transaction in goldengate(跳过一个事务OGG)

skip a transaction in goldengate(跳过一个事务OGG)

我们现在用OGG做两个ORACLE OLTP数据库的A-A同步含DDL,刚发现Replicat进程ABENDING了,下面分析一下原因

ggserr.log日志

2012-10-31 17:09:05  WARNING OGG-00869  Oracle GoldenGate Delivery for Oracle, ricme.prm:  OCI Error ORA-02292: integrity constraint (ICME.FK_NOPROSCORE_TO_STU) violated - child record found (status = 2292). UPDATE "ICME"."ICME_STUDENT" SET "IC_CODE" = :a1,"REMARK" = :a2,"MODIFY_TIME" = :a3 WHERE "IC_CODE" = :b0.
2012-10-31 17:09:05  WARNING OGG-01004  Oracle GoldenGate Delivery for Oracle, ricme.prm:  Aborted grouped transaction on 'ICME.ICME_STUDENT', Database error 2292 (OCI Error ORA-02292: integrity constraint (ICME.FK_NOPROSCORE_TO_STU) violated - child record found (status = 2292). UPDATE "ICME"."ICME_STUDENT" SET "IC_CODE" = :a1,"REMARK" = :a2,"MODIFY_TIME" = :a3 WHERE "IC_CODE" = :b0).
2012-10-31 17:09:05  WARNING OGG-01003  Oracle GoldenGate Delivery for Oracle, ricme.prm:  Repositioning to rba 84509907 in seqno 40.
2012-10-31 17:09:05  WARNING OGG-01154  Oracle GoldenGate Delivery for Oracle, ricme.prm:  SQL error 2292 mapping ICME.ICME_STUDENT to ICME.ICME_STUDENT OCI Error ORA-02292: integrity constraint (ICME.FK_NOPROSCORE_TO_STU) violated - child record found (status = 2292). UPDATE "ICME"."ICME_STUDENT" SET "IC_CODE" = :a1,"REMARK" = :a2,"MODIFY_TIME" = :a3 WHERE "IC_CODE" = :b0.
2012-10-31 17:09:05  WARNING OGG-01003  Oracle GoldenGate Delivery for Oracle, ricme.prm:  Repositioning to rba 84509907 in seqno 40.
2012-10-31 17:09:05  ERROR   OGG-01296  Oracle GoldenGate Delivery for Oracle, ricme.prm:  Error mapping from ICME.ICME_STUDENT to ICME.ICME_STUDENT.
2012-10-31 17:09:05  ERROR   OGG-01668  Oracle GoldenGate Delivery for Oracle, ricme.prm:  PROCESS ABENDING.

在日志中能看出大概SQL,我的replicat group配置文件配置了DiscardFile 记录了image

[oracle@ggsdb dirrpt]$ vi ricme.dsc
OCI Error ORA-02292: integrity constraint (ICME.FK_NOPROSCORE_TO_STU) violated – child record found (status = 2292). UPDATE “ICME”.”ICME_STUDENT” SET “IC_COD
E” = :a1,”REMARK” = :a2,”MODIFY_TIME” = :a3 WHERE “IC_CODE” = :b0
Aborting transaction on dirdat/l2 beginning at seqno 40 rba 84509907
error at seqno 40 rba 84509907
Problem replicating ICME.ICME_STUDENT to ICME.ICME_STUDENT
Mapping problem with compressed key update record (target format)…
*
IC_CODE = 1114020AY
IC_CODE = 3
REMARK =
000000: bf a8 ba c5 d6 d8 b8 b4

看到这个sql,我确认了修改内容,问了下同事果然是失误操作,修改了学员卡号,而那个卡号上是有trigger,会级连修改好多相关表,而且有外键约束,但从库上的trigger是disable的,所以就遇到了外键约束导致备库更新失败,不过后来同事又修改回来了,数据上在主库是还原了的,那我可以来跳过此事务

首先先找到replicat进程当前应用到的rba,也就是csn(commit sequence number),在oracle中的scn,来定位下次应用的起始RBA,它就是在trail文件中下一次replicat 进程将要fseek() call 并起动进程的位置(actual byte position )

GGSCI (ggsdb) 4> info all

Program     Status      Group       Lag at Chkpt  Time Since Chkpt

MANAGER     RUNNING                                           
REPLICAT    ABENDED     RICME       00:00:00      00:29:41    


GGSCI (ggsdb) 5> info rep ricme

REPLICAT   RICME     Last Started 2012-10-31 17:23   Status ABENDED
Checkpoint Lag       00:00:00 (updated 00:29:47 ago)
Log Read Checkpoint  File dirdat/l2000040
                     2012-10-31 17:08:56.879106  RBA 84509907

通过上面的信息我们知道了replicat进程ricme group 下在应用到了dirdat/l2000040的RBA 84509907,我们想跳过这个事务应用下一条记录就可以,但是可不是简单的在当前的RBA上加1,RBA必须是有OGG格式过的,如果输入的是无效地址启动后EXCEPTION会记录到ggserr.log中,我们可以用OGG安装目录下的logdump工具来定位下一条记录的“真正”位置


[oracle@ggsdb ogg11r2]$ ./logdump

Oracle GoldenGate Log File Dump Utility for Oracle
Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230

Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved.

 
Logdump 1 >open dirdat/l2000040
Current LogTrail is /oracle/ogg11r2/dirdat/l2000040 
Logdump 2 >pos 84509907
Reading forward from RBA 84509907 
Logdump 3 >n

2012/10/31 17:08:58.914.149 GGSPKUpdate          Len    69 RBA 84509907 
Name: ICME.ICME_STUDENT 
After  Image:                                             Partition 4   G  b   
 0011 0000 000d 0000 0009 3131 3134 3032 3041 5900 | ..........1114020AY.  
 0000 0500 0000 0133 0018 000c 0000 0008 bfa8 bac5 | .......3............  
 d6d8 b8b4 001d 0015 0000 3230 3132 2d31 302d 3331 | ..........2012-10-31  
 3a31 373a 3034 3a33 39                            | :17:04:39  
   
Logdump 4 >n

2012/10/31 17:08:58.914.149 FieldComp            Len    23 RBA 84510103 
Name: ICME.ICME_PROJECT_SCORE 
After  Image:                                             Partition 4   G  m   
 0000 000a 0000 0000 0000 0252 1521 0001 0005 0000 | ...........R.!......  
 0001 33                                           | ..3  
   
Logdump 5 >exit

pos是position的缩写,意思是定位到replicat启始的位置,n是next的缩写,第一个n定位显示出当前应用的记录,可以看出是update 还有表的名字,还有image的值,我们要跳过这个事务当然要再输一个n,可以看到下一个记录的rba是 84510103绝不是前面RBA简单的加1.这样我们就可以修改replicat进程启动时的rba指定为84510103

GGSCI (ggsdb) 1> alter replicat ricme, extrba 84510103
REPLICAT altered.
GGSCI (ggsdb) 3> start ricme

Sending START request to MANAGER ...
REPLICAT RICME starting

当然如果还有失败的事务还可以继续next用上面的方法,不过如果有几个连续的事务需要skip,那就可以用另外一个方法

start rep ricme skiptransaction

不过跳过的事务数是未知的,同样也会记录到discard文件中,如果参数中配置了。

打赏

对不起,这篇文章暂时关闭评论。