skip a transaction in goldengate(跳过一个事务OGG)
我们现在用OGG做两个ORACLE OLTP数据库的A-A同步含DDL,刚发现Replicat进程ABENDING了,下面分析一下原因
ggserr.log日志
2012-10-31 17:09:05 WARNING OGG-00869 Oracle GoldenGate Delivery for Oracle, ricme.prm: OCI Error ORA-02292: integrity constraint (ICME.FK_NOPROSCORE_TO_STU) violated - child record found (status = 2292). UPDATE "ICME"."ICME_STUDENT" SET "IC_CODE" = :a1,"REMARK" = :a2,"MODIFY_TIME" = :a3 WHERE "IC_CODE" = :b0. 2012-10-31 17:09:05 WARNING OGG-01004 Oracle GoldenGate Delivery for Oracle, ricme.prm: Aborted grouped transaction on 'ICME.ICME_STUDENT', Database error 2292 (OCI Error ORA-02292: integrity constraint (ICME.FK_NOPROSCORE_TO_STU) violated - child record found (status = 2292). UPDATE "ICME"."ICME_STUDENT" SET "IC_CODE" = :a1,"REMARK" = :a2,"MODIFY_TIME" = :a3 WHERE "IC_CODE" = :b0). 2012-10-31 17:09:05 WARNING OGG-01003 Oracle GoldenGate Delivery for Oracle, ricme.prm: Repositioning to rba 84509907 in seqno 40. 2012-10-31 17:09:05 WARNING OGG-01154 Oracle GoldenGate Delivery for Oracle, ricme.prm: SQL error 2292 mapping ICME.ICME_STUDENT to ICME.ICME_STUDENT OCI Error ORA-02292: integrity constraint (ICME.FK_NOPROSCORE_TO_STU) violated - child record found (status = 2292). UPDATE "ICME"."ICME_STUDENT" SET "IC_CODE" = :a1,"REMARK" = :a2,"MODIFY_TIME" = :a3 WHERE "IC_CODE" = :b0. 2012-10-31 17:09:05 WARNING OGG-01003 Oracle GoldenGate Delivery for Oracle, ricme.prm: Repositioning to rba 84509907 in seqno 40. 2012-10-31 17:09:05 ERROR OGG-01296 Oracle GoldenGate Delivery for Oracle, ricme.prm: Error mapping from ICME.ICME_STUDENT to ICME.ICME_STUDENT. 2012-10-31 17:09:05 ERROR OGG-01668 Oracle GoldenGate Delivery for Oracle, ricme.prm: PROCESS ABENDING.
在日志中能看出大概SQL,我的replicat group配置文件配置了DiscardFile 记录了image
[oracle@ggsdb dirrpt]$ vi ricme.dsc
OCI Error ORA-02292: integrity constraint (ICME.FK_NOPROSCORE_TO_STU) violated – child record found (status = 2292). UPDATE “ICME”.”ICME_STUDENT” SET “IC_COD
E” = :a1,”REMARK” = :a2,”MODIFY_TIME” = :a3 WHERE “IC_CODE” = :b0
Aborting transaction on dirdat/l2 beginning at seqno 40 rba 84509907
error at seqno 40 rba 84509907
Problem replicating ICME.ICME_STUDENT to ICME.ICME_STUDENT
Mapping problem with compressed key update record (target format)…
*
IC_CODE = 1114020AY
IC_CODE = 3
REMARK =
000000: bf a8 ba c5 d6 d8 b8 b4
看到这个sql,我确认了修改内容,问了下同事果然是失误操作,修改了学员卡号,而那个卡号上是有trigger,会级连修改好多相关表,而且有外键约束,但从库上的trigger是disable的,所以就遇到了外键约束导致备库更新失败,不过后来同事又修改回来了,数据上在主库是还原了的,那我可以来跳过此事务
首先先找到replicat进程当前应用到的rba,也就是csn(commit sequence number),在oracle中的scn,来定位下次应用的起始RBA,它就是在trail文件中下一次replicat 进程将要fseek() call 并起动进程的位置(actual byte position )
GGSCI (ggsdb) 4> info all Program Status Group Lag at Chkpt Time Since Chkpt MANAGER RUNNING REPLICAT ABENDED RICME 00:00:00 00:29:41 GGSCI (ggsdb) 5> info rep ricme REPLICAT RICME Last Started 2012-10-31 17:23 Status ABENDED Checkpoint Lag 00:00:00 (updated 00:29:47 ago) Log Read Checkpoint File dirdat/l2000040 2012-10-31 17:08:56.879106 RBA 84509907
通过上面的信息我们知道了replicat进程ricme group 下在应用到了dirdat/l2000040的RBA 84509907,我们想跳过这个事务应用下一条记录就可以,但是可不是简单的在当前的RBA上加1,RBA必须是有OGG格式过的,如果输入的是无效地址启动后EXCEPTION会记录到ggserr.log中,我们可以用OGG安装目录下的logdump工具来定位下一条记录的“真正”位置
[oracle@ggsdb ogg11r2]$ ./logdump Oracle GoldenGate Log File Dump Utility for Oracle Version 11.2.1.0.1 OGGCORE_11.2.1.0.1_PLATFORMS_120423.0230 Copyright (C) 1995, 2012, Oracle and/or its affiliates. All rights reserved. Logdump 1 >open dirdat/l2000040 Current LogTrail is /oracle/ogg11r2/dirdat/l2000040 Logdump 2 >pos 84509907 Reading forward from RBA 84509907 Logdump 3 >n 2012/10/31 17:08:58.914.149 GGSPKUpdate Len 69 RBA 84509907 Name: ICME.ICME_STUDENT After Image: Partition 4 G b 0011 0000 000d 0000 0009 3131 3134 3032 3041 5900 | ..........1114020AY. 0000 0500 0000 0133 0018 000c 0000 0008 bfa8 bac5 | .......3............ d6d8 b8b4 001d 0015 0000 3230 3132 2d31 302d 3331 | ..........2012-10-31 3a31 373a 3034 3a33 39 | :17:04:39 Logdump 4 >n 2012/10/31 17:08:58.914.149 FieldComp Len 23 RBA 84510103 Name: ICME.ICME_PROJECT_SCORE After Image: Partition 4 G m 0000 000a 0000 0000 0000 0252 1521 0001 0005 0000 | ...........R.!...... 0001 33 | ..3 Logdump 5 >exit
pos是position的缩写,意思是定位到replicat启始的位置,n是next的缩写,第一个n定位显示出当前应用的记录,可以看出是update 还有表的名字,还有image的值,我们要跳过这个事务当然要再输一个n,可以看到下一个记录的rba是 84510103绝不是前面RBA简单的加1.这样我们就可以修改replicat进程启动时的rba指定为84510103
GGSCI (ggsdb) 1> alter replicat ricme, extrba 84510103 REPLICAT altered. GGSCI (ggsdb) 3> start ricme Sending START request to MANAGER ... REPLICAT RICME starting
当然如果还有失败的事务还可以继续next用上面的方法,不过如果有几个连续的事务需要skip,那就可以用另外一个方法
start rep ricme skiptransaction
不过跳过的事务数是未知的,同样也会记录到discard文件中,如果参数中配置了。
对不起,这篇文章暂时关闭评论。