Lots of Long transaction caused by database link, and undo hdr show DBA for that slot is 0x00000000
部署GOLDENGATE时发现,当前库中存在较多的长事务,在v$transaction中显示状态一直是ACTIVE, 对于长事务的对OGG的BR或启动抽取位置有较大影响, 奇怪的是这些长事务的起动时间甚至都有3天以上,而且当前会话状态已是INACITVE.而且查看UNDO SEGMENT HEADER上对应的SLOT 的DBA是0x00000000。
ANBOB@SQL> SELECT xidusn, 2 xidslot, 3 xidsqn, 4 ubafil, 5 ubablk, 6 ubasqn, 7 ubarec, 8 t.status, 9 start_time, 10 start_ubafil, 11 start_ubablk, 12 start_ubasqn, 13 start_ubarec, 14 ses_addr, 15 used_ublk, 16 used_urec, 17 s.sql_id,sid 18 FROM v$transaction t,v$session s 19 WHERE addr = '0700000A0D6B1308' and T.addr=S.TADDR 20 ; XIDUSN XIDSLOT XIDSQN UBAFIL UBABLK UBASQN UBAREC STATUS ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------- 2880 51 2938814 0 0 0 0 ACTIVE START_TIME START_UBAFIL START_UBABLK START_UBASQN START_UBAREC SES_ADDR -------------------- ------------ ------------ ------------ ------------ ---------------- 08/23/16 05:38:03 663 410489 23081 50 0700000A0199F210 USED_UBLK USED_UREC SQL_ID SID ---------- ---------- ------------- ---------- 1 1 3331 ANBOB@SQL> SELECT 2 ktuxeusn usn# -- 65535 = no-undo transaction 3 , ktuxeslt slot# -- 65535 = invalid slot# 4 , ktuxesqn seq# 5 , ktuxesta status 6 , ktuxecfl flags 7 , ktuxesiz undo_blks 8 , ktuxerdbf curfile 9 , ktuxerdbb curblock 10 , ktuxescnw * power(2, 32) + ktuxescnb cscn -- commit/prepare commit SCN 11 , ktuxeuel 12 -- distributed xacts 13 --, ktuxeddbf r_rfile 14 --, ktuxeddbb r_rblock 15 , ktuxepusn r_usn# 16 , ktuxepslt r_slot# 17 , ktuxepsqn r_seq# 18 FROM 19 x$ktuxe 20 WHERE ktuxesta != 'INACTIVE' and ktuxeusn=2880 21 ORDER BY 22 ktuxeusn 23 , ktuxeslt 24 / USN# SLOT# SEQ# STATUS FLAGS UNDO_BLKS CURFILE CURBLOCK CSCN KTUXEUEL R_USN# R_SLOT# R_SEQ# ---------- ---------- ---------- ---------------- --------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- 2880 51 2938814 ACTIVE NONE 0 0 0 1.4942E+13 2 0 0 0
NOTE:
UBAFIL,UBABLK,UBAREC 都为0, USED_UBLK为1, 而且该事务一直是ACTIVE 状态, start_time是三天前的时间。 SQL ID为空,通常我们是找出这个事务相关的对象和事务类型, 下面dump undo segment header.
ANBOB@SQL> select segment_name from dba_rollback_segs where segment_id=2880; SEGMENT_NAME ------------------------------ _SYSSMU2880_2077687819$ ANBOB@SQL> ALTER SYSTEM DUMP UNDO HEADER '_SYSSMU2880_2077687819$'; System altered. index state cflags wrap# uel scn dba parent-xid nub stmt_num cmt ------------------------------------------------------------------------------------------------ 0x00 9 0x00 0x2d1821 0x0008 0x0d97.0fb42008 0xa5c64379 0x0000.000.00000000 0x00000001 0x00000000 1471901792 0x01 9 0x00 0x2d1820 0x0032 0x0d97.0fb4d3db 0xa5c64379 0x0000.000.00000000 0x00000001 0x00000000 1471901798 0x02 9 0x00 0x2d16df 0x002c 0x0d97.0f8e5c10 0xa5c64379 0x0000.000.00000000 0x00000001 0x00000000 1471901528 0x03 9 0x00 0x2d181e 0x003c 0x0d97.0faafce2 0xa5c64379 0x0000.000.00000000 0x00000001 0x00000000 1471901697 0x04 9 0x00 0x2d17ed 0x0037 0x0d97.0fb26ea1 0xa5c64379 0x0000.000.00000000 0x00000001 0x00000000 1471901771 0x05 9 0x00 0x2d17bc 0x0027 0x0d97.0fa19bc3 0xa5c64379 0x0000.000.00000000 0x00000001 0x00000000 1471901638 0x06 9 0x00 0x2d063b 0xffff 0x0d99.305693fb 0x00000000 0x0000.000.00000000 0x00000000 0x00000000 1472687730 ... 0x2e 9 0x00 0x2d1823 0x001b 0x0d97.0fbb6dd3 0xa5c64379 0x0000.000.00000000 0x00000001 0x00000000 1471901861 0x2f 9 0x00 0x2d1832 0x0038 0x0d97.b134f1bc 0x00000000 0x0000.000.00000000 0x00000000 0x00000000 1472122507 0x30 9 0x00 0x2d1811 0x003f 0x0d98.75d8e475 0x00000000 0x0000.000.00000000 0x00000000 0x00000000 1472424937 0x31 9 0x00 0x2d1550 0x003e 0x0d97.0fbddba4 0xa5c64379 0x0000.000.00000000 0x00000001 0x00000000 1471901882 0x32 9 0x00 0x2d181f 0x0044 0x0d97.0fb5ca7a 0xa5c64379 0x0000.000.00000000 0x00000001 0x00000000 1471901808 0x33 10 0x80 0x2cd7be 0x0002 0x0d97.0fbdee68 0x00000000 0x0000.000.00000000 0x00000000 0x00000000 0 0x34 9 0x00 0x2d182d 0x0013 0x0d97.0fafa64e 0xa5c64379 0x0000.000.00000000 0x00000001 0x00000000 1471901737 0x35 9 0x00 0x2d184c 0x003b 0x0d97.0fade00c 0xa5c64379 0x0000.000.00000000 0x00000001 0x00000000 1471901720 ANBOB@SQL> SELECT * FROM x$ktuxe WHERE ktuxeusn=2880 AND INDX=51; ADDR INDX INST_ID KTUXEUSN KTUXESLT KTUXESQN KTUXERDBF KTUXERDBB KTUXESCNB KTUXESCNW KTUXESTA ---------------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------------- 00000001109CDD30 51 1 2880 51 2938814 0 0 264105576 3479 ACTIVE KTUXECFL KTUXEUEL KTUXEDDBF KTUXEDDBB KTUXEPUSN KTUXEPSLT KTUXEPSQN KTUXESIZ ------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- NONE 2 0 0 0 0 0 0
NOTE:
注意到UNDO段头上记录的记录确为ACTIVE状态,但是找事务的最后一个UNDO block DBA时发现是0, 而且cmt 也为0, 关于undo seg的意义不在本篇笔记的范围后续会写,
结束没法找,下面尝试dump 事务开始时的undo block.
##### dump strat undo block ANBOB@SQL> alter system dump datafile 663 block 410489; System altered. ANBOB@SQL> oradebug setmypid Statement processed. ANBOB@SQL> oradebug tracefile_name /oracle/app/oracle/diag/rdbms/acctd/acctd1/trace/acctd1_ora_40173614.trc ******************************************************************************** UNDO BLK: xid: 0x0b40.033.002cd7be seq: 0x5a29 cnt: 0x32 irb: 0x32 icl: 0x0 flg: 0x0000 <<<<<<<<<<# USN# SLOT# SEQ# START_UBASQN Rec Offset Rec Offset Rec Offset Rec Offset Rec Offset --------------------------------------------------------------------------- 0x01 0x3f60 0x02 0x3ed8 0x03 0x3e00 0x04 0x3d48 0x05 0x3c70 0x06 0x3be8 0x07 0x3b1c 0x08 0x3a94 0x09 0x39bc 0x0a 0x3934 0x0b 0x3870 0x0c 0x37c0 0x0d 0x3738 0x0e 0x36b0 0x0f 0x35d8 0x10 0x3520 0x11 0x3470 0x12 0x33c0 0x13 0x3310 0x14 0x324c 0x15 0x3194 0x16 0x30d0 0x17 0x3020 0x18 0x2f98 0x19 0x2f10 0x1a 0x2e38 0x1b 0x2db0 0x1c 0x2d00 0x1d 0x2c4c 0x1e 0x2b74 0x1f 0x2aec 0x20 0x2a64 0x21 0x29dc 0x22 0x2954 0x23 0x28cc 0x24 0x27f4 0x25 0x276c 0x26 0x26e4 0x27 0x265c 0x28 0x25ac 0x29 0x2524 0x2a 0x2460 0x2b 0x23d8 0x2c 0x2320 0x2d 0x224c 0x2e 0x21c4 0x2f 0x213c 0x30 0x2088 0x31 0x2000 0x32 0x1fb0 <<<<<<# start_rec *----------------------------- * Rec #0x31 slt: 0x31 objn: 2171677(0x0021231d) objd: 2931186 tblspc: 27(0x0000001b) * Layer: 11 (Row) opc: 1 rci 0x00 Undo type: Regular undo Begin trans Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x00000000Ext idx: 0 flg2: 0 *----------------------------- uba: 0xa5c64379.5a29.30 ctl max scn: 0x0d97.0f6f6c77 prv tx scn: 0x0d97.0f705634 txn start scn: scn: 0x0d97.0fbd52aa logon user: 389 prev brb: 2781234038 prev bcl: 0 KDO undo record: KTB Redo op: 0x04 ver: 0x01 compat bit: 4 (post-11) padding: 1 op: L itl: xid: 0x0b38.00f.00260322 uba: 0xa5c63636.5afb.4f flg: C--- lkc: 0 scn: 0x0d97.0fbd5270 KDO Op code: LKR row dependencies Disabled xtype: XA flags: 0x00000000 bdba: 0x704ab721 hdba: 0x704ab70a itli: 12 ispac: 0 maxfr: 9774 tabn: 0 slot: 14 to: 0 # find 0x32 *----------------------------- * Rec #0x32 slt: 0x33 objn: 0(0x00000000) objd: 0 tblspc: 0(0x00000000) <<<< * Layer: 5 (Transaction Undo) opc: 7 rci 0x00 Undo type: Regular undo Begin trans Last buffer split: No Temp Object: No Tablespace Undo: No rdba: 0x00000000Ext idx: 0 flg2: 0 *----------------------------- uba: 0xa5c64379.5a29.31 ctl max scn: 0x0d97.0f705634 prv tx scn: 0x0d97.0f720e6f txn start scn: scn: 0x0d97.0fbdee68 logon user: 389 prev brb: 2781234038 prev bcl: 0 2016-09-02 16:49:19.961415 : kjbmbassert [0x64379.297] 2016-09-02 16:49:19.961557 : kjbmsassert(0x64379.297)(2) End dump data blocks tsn: 1 file#: 663 minblk 410489 maxblk 410489
NOTE:
从该事务开始的undo block中undo 记录0x32(50)中发现只是一个transaction undo 类型objn 0 没有相关的对象,也没有DML的UNDO记录, 继续查看活动事务对应的session对锁的持有情况.
ANBOB@SQL> select * from v$lock where sid=3331;
ADDR KADDR SID TY ID1 ID2 LMODE REQUEST CTIME BLOCK
---------------- ---------------- ---------- -- ---------- ---------- ---------- ---------- ---------- ----------
0700000A21ED9FD8 0700000A21EDA030 3331 AE 2498633 0 4 0 1143242 2
0700000A0D6B1308 0700000A0D6B1380 3331 TX 188743731 2938814 6 0 1142147 2
ANBOB@SQL> select b.username,a.sid, trunc(a.id1/power(2,16)) rbs,bitand(a.id1,to_number('ffff','xxxx'))+0 slot,
a.id2 seq,a.lmode,a.request, a.block
from v$lock a,v$session b
where a.type = 'TX' and a.sid = b.sid and b.sid=3331;
USERNAME SID RBS SLOT SEQ LMODE REQUEST BLOCK
------------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ----------
ACCOUNT 3331 2880 51 2938814 6 0 2
Note:
该会话持有一个TX LMODE 6的锁,block=2 全局锁。当前库无分布式事务锁。
ANBOB@SQL> select * from dba_2pc_pending; no rows selected
其实这种现象是有分布式事务引起的如通过DBLINK的SQL,即使只是一个通过DATABASE LINK的的一个查询,数据库也会在本地使用一个UNDO block来标志这个分布式事务,并且持有一个TX lock, 因为本地节点不能知道远程节点的UNDO情况,所有undo段头上DBA为0 .
解决方法有两种
1,commit, rollback, logout 即使是一个select 语句通过DBLINK,这里不再演示。
2,如果你能确认通过DBLINK只是做查询,那样可以在事务开始set transaction read only; 然后再通过dblink 查询就不会产生这种长事务,下面演示这种。
SYS@SQL> set transaction read only; Transaction set. SYS@SQL> select sysdate from dual@REPORT; SYSDATE ----------------- 20160911 18:47:29 ANBOB@SQL> SELECT t.xidusn "RBS Name", t.used_ublk "Used Blocks", s.username "User Name" ,t.start_time FROM v$transaction t, v$session s WHERE t.ses_addr=s.saddr and s.sid=2496 / RBS Name Used Blocks User Name START_TIME ---------- ----------- ------------------------------ -------------------- 4330 1 SYS 09/11/16 18:47:23
NOTE:
oops!, 可以看到还是分配1个undo block, 开始我以为是版本有问题,测了几个版本依然存在,终于还是从官方文档找到了原因。
This clause is not supported for the user SYS. Queries by SYS will return changes made during the transaction even if SYS has set the transaction to be READ ONLY.
ANBOB@SQL> conn system Enter password: Connected. USERNAME INST_NAME HOST_NAME SID SERIAL# VERSION STARTED SPID OPID CPID SADDR PADDR -------------------- -------------------- ------------------------- ----- -------- ---------- -------- --------------- ----- --------------- ---------------- ---------------- SYSTEM anbob1 weejar1 2581 44169 11.2.0.3.0 20160130 30540308 91 30474778 0700000A018577E0 0700000A191A7728 SYSTEM@SQL> select sysdate from dual@REPORT; SYSDATE ----------------- 20160913 10:26:46 ANBOB@SQL> SELECT t.xidusn "RBS Name", 2 t.used_ublk "Used Blocks", 3 s.username "User Name" 4 ,t.start_time 5 FROM v$transaction t, 6 v$session s 7 WHERE t.ses_addr=s.saddr 8 and s.sid=2581; RBS Name Used Blocks User Name START_TIME ---------- ----------- ------------------------------ -------------------- 2975 1 SYSTEM 09/13/16 10:26:40 SYSTEM@SQL> commit; Commit complete. SYSTEM@SQL> set transaction read only; Transaction set. SYSTEM@SQL> select sysdate from dual@REPORT; SYSDATE ----------------- 20160913 10:28:27 SYSTEM@SQL> SELECT t.xidusn "RBS Name", 2 t.used_ublk "Used Blocks", 3 s.username "User Name" 4 ,t.start_time 5 FROM v$transaction t, 6 v$session s 7 WHERE t.ses_addr=s.saddr 8 and s.sid=2581; no rows selected
关于DBLINK的使用脚本可以看这里Script: Who’s using a database link?(找出谁在使用dblink)
Summary:
分布式事务即使是通过DBLINK的select也会分配一个undo block,并持有tx lock的事务,如果该会话最后停留这样的事务,就会导致长事务一直不释放,从而影响OGG, 对于这样的事务要及时COMMIT或rollback, 或在对于只查询的DBLINK事务开始前使用set transaction read only; 这个配置不影响SYS用户。
对不起,这篇文章暂时关闭评论。