Troubleshooting large amount of undo/redo generated due to dblink usage
前不久一客户的数据库出现了较高的enq: US – contention和latch: undo global data等待,从等待事件可以状态出现了较高的undo争用, 当然后来通过SQL的执行量TOP1 发现了SQL中使用了dblink,在墨天轮上记录了这个问题,查看请移步这里, 在几年前我blog 记录过一篇dblink会分配undo段的问题,请看这篇《Lots of Long transaction caused by database link, and undo hdr show DBA for that slot is 0x00000000》, 这次的案例sql的执行次数和v$undostat.TXNCOUNT 匹配98%以上,所以可以定性导致,这里简单的记录如果它的执行量不是TOP 有没有可能通过session statistic去查找一下此类问题。
— 测试环境 19.13, dblink指向自已。
[oracle@oel7db1 ~]$ sqlplus anbob/anbob@cdb1pdb1 SQL*Plus: Release 19.0.0.0.0 - Production on Tue Mar 1 15:49:38 2022 Version 19.3.0.0.0 Copyright (c) 1982, 2019, Oracle. All rights reserved. Connected to: Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.3.0.0.0 USERNAME INST_NAME HOST_NAME I# SID SERIAL# VERSION STARTED SPID OPID CPID SADDR PADDR -------------------- -------------------- ------------------------- --- ----- -------- ---------- -------- ---------- ----- --------------- ---------------- ---------------- ANBOB PDB1-anbob19c oel7db1 1 72 21418 19.0.0.0.0 20220301 24185 42 24183 0000000078532180 0000000079103FA8 SQL> select sysdate from dual@dl; SYSDATE ------------------- 2022-03-01 15:49:49 SQL> select addr taddr,UBAFIL,UBABLK,UBASQN,status,START_TIME,XIDUSN,XIDSLOT,XIDSQN,xid from v$transaction where ses_addr='0000000078532180'; TADDR UBAFIL UBABLK UBASQN STATUS START_TIME XIDUSN XIDSLOT XIDSQN XID ---------------- ---------- ---------- ---------- ---------------- -------------------- ---------- ---------- ---------- ---------------- 00000000754D8878 0 0 0 ACTIVE 03/01/22 15:49:48 4 27 17776 04001B0070450000 SQL> select sysdate from dual@dl; SYSDATE ------------------- 2022-03-01 15:50:51 SQL> select addr taddr,UBAFIL,UBABLK,UBASQN,status,START_TIME,XIDUSN,XIDSLOT,XIDSQN,xid from v$transaction where ses_addr='0000000078532180'; TADDR UBAFIL UBABLK UBASQN STATUS START_TIME XIDUSN XIDSLOT XIDSQN XID ---------------- ---------- ---------- ---------- ---------------- -------------------- ---------- ---------- ---------- ---------------- 00000000754D8878 0 0 0 ACTIVE 03/01/22 15:49:48 4 27 17776 04001B0070450000 SQL> commit; Commit complete. SQL> select addr taddr,UBAFIL,UBABLK,UBASQN,status,START_TIME,XIDUSN,XIDSLOT,XIDSQN,xid from v$transaction where ses_addr='0000000078532180'; no rows selected SQL> select sysdate from dual@dl; SYSDATE ------------------- 2022-03-01 15:51:10 SQL> select addr taddr,UBAFIL,UBABLK,UBASQN,status,START_TIME,XIDUSN,XIDSLOT,XIDSQN,xid from v$transaction where ses_addr='0000000078532180'; TADDR UBAFIL UBABLK UBASQN STATUS START_TIME XIDUSN XIDSLOT XIDSQN XID ---------------- ---------- ---------- ---------- ---------------- -------------------- ---------- ---------- ---------- ---------------- 00000000754D8878 0 0 0 ACTIVE 03/01/22 15:51:10 1 12 18518 01000C0056480000 SQL>
Note:
同一个会话,如果没有发起commit/rollback 等隐式 结束事务,多次查询dblink时会复用之前的undo段, 如果commit结束事务下次再查询时会生成新undo段, 下面测试如果把量级加大看看统计信息的指标
循环使用dblink不提交,复用undo
SQL> create table s1 as select * from v$sesstat where sid=57; Table created. declare l_now date; begin loop select sysdate into l_now from dual@dl; end loop; end; SQL> create table s2 as select * from v$sesstat where sid=57; Table created. SQL> select s1.statistic#,sn.name,s1.value,s2.value, s2.value-s1.value diff, (s2.value-s1.value)/decode(s1.value,0,-1,s1.value) pct from s1,s2,v$statname sn where s1.statistic#=sn.statistic# and s1.sid=s2.sid and s1.statistic#=s2.statistic# and s2.value>s1.value order by 5 desc; STATISTIC# NAME VALUE VALUE DIFF PCT ---------- ---------------------------------------------------------------- ---------- ---------- ---------- ---------- 1882 bytes sent via SQL*Net to dblink 3779 1095048570 1095044791 289771.048 1883 bytes received via SQL*Net from dblink 4424 844073989 844069565 190793.301 46 enqueue conversions 14 8963541 8963527 640251.929 29 non-idle wait count 93 5975839 5975746 64255.3333 1884 SQL*Net roundtrips to/from dblink 10 2987872 2987862 298786.2 9 recursive calls 478 1494557 1494079 3125.68828 306 calls to get snapshot scn: kcmgss 100 1494066 1493966 14939.66 4 opened cursors cumulative 103 1494058 1493955 14504.4175 1878 execute count 99 1494052 1493953 15090.4343 1755 session cursor cache hits 61 1494001 1493940 24490.8197 171 logical read bytes from cache 2801664 3948544 1146880 .409356725 177 physical read bytes 180224 311296 131072 .727272727 67 cell physical IO interconnect bytes 180224 311296 131072 .727272727 57 physical read total bytes 180224 311296 131072 .727272727 33 session uga memory 956560 1022048 65488 .068461989 28 non-idle wait time 4 43309 43305 10826.25 19 CPU used by this session 6 42314 42308 7051.33333 10 recursive cpu usage 2 40616 40614 20307 405 file io wait time 653 19197 18544 28.3981623 313 redo size 964 8760 7796 8.08713693 610 undo change vector size 272 2400 2128 7.82352941 1880 bytes received via SQL*Net from client 6398 6838 440 .068771491 32 process last non-idle time 1646118238 1646118513 275 1.6706E-07 14 session logical reads 342 482 140 .409356725 164 consistent gets from cache 337 422 85 .252225519 163 consistent gets 337 422 85 .252225519 160 db block gets from cache 5 60 55 11 159 db block gets 5 60 55 11 1799 buffer is not pinned count 275 322 47 .170909091 178 db block changes 6 52 46 7.66666667 165 consistent gets pin 235 281 46 .195744681 161 db block gets from cache (fastpath) 4 44 40 10 653 no work - consistent read gets 219 258 39 .178082192 168 consistent gets examination (fastpath) 97 136 39 .402061856 167 consistent gets examination 102 141 39 .382352941 166 consistent gets pin (fastpath) 218 252 34 .155963303 1096 index range scans 74 106 32 .432432432 45 enqueue requests 15 43 28 1.86666667 47 enqueue releases 13 40 27 2.07692308 312 redo entries 3 27 24 8 1873 parse count (total) 52 75 23 .442307692 1889 sorts (memory) 17 40 23 1.35294118 1874 parse count (hard) 3 24 21 7 1872 parse time elapsed 12 30 18 1.5 1010 table scan rows gotten 2006 2024 18 .008973081 1013 table scan disk non-IMC rows gotten 2006 2024 18 .008973081 244 commit cleanouts successfully completed 1 18 17 17 243 commit cleanouts 1 18 17 17 230 free buffer requested 23 39 16 .695652174 176 physical read IO requests 22 38 16 .727272727 52 physical read total IO requests 22 38 16 .727272727 172 physical reads 22 38 16 .727272727 173 physical reads cache 22 38 16 .727272727 665 deferred (CURRENT) block cleanout applications 1 14 13 13 27 scheduler wait time 0 11 11 -11 1798 buffer is pinned count 1019 1028 9 .008832188 302 calls to kcmgcs 18 25 7 .388888889 1728 CCursor + sql area evicted 0 6 6 -6 1004 table scans (short tables) 6 12 6 1 1015 table scan blocks gotten 36 42 6 .166666667 1726 sql area purged 0 6 6 -6 1875 parse count (failures) 0 6 6 -6 1727 sql area evicted 0 6 6 -6 1871 parse time cpu 5 10 5 1 674 blocks cleaned out using minact 0 5 5 -5 1113 HSC Heap Segment Block Changes 1 5 4 4 304 calls to kcmgas 1 5 4 4 404 file io service time 0 3 3 -3 275 shared hash latch upgrades - no wait 9 12 3 .333333333 663 immediate (CURRENT) block cleanout applications 0 2 2 -2 11 pinned cursors current 0 2 2 -2 8 user calls 26 28 2 .076923077 1783 cursor authentications 0 2 2 -2 1 Requests to/from client 18 19 1 .055555556 5 opened cursors current 2 3 1 .5 26 user I/O wait time 0 1 1 -1 35 messages sent 1 2 1 1 1016 table fetch by rowid 590 591 1 .001694915 1095 index fetch by key 16 17 1 .0625 1802 workarea executions - optimal 13 14 1 .076923077 1891 sorts (rows) 160 161 1 .00625 81 rows selected. SQL> @uds Show undo statistics from V$UNDOSTAT.... MAX MAX ORA- SPC UNEXP EXP BEGIN_TIME END_TIME TXNCOUNT MB MB/s QRYLEN QRY_ID 1555 ERR STEAL STEAL ------------------- -------- ---------- ---------- ---------- ------- ------------- ----- ------ -------- -------- 2022-03-01 15:12:05 15:16:10 2 .00 .00 0 0 0 0 0 2022-03-01 15:02:05 15:12:05 116 .05 .00 0 0 0 0 0 2022-03-01 14:52:05 15:02:05 101 .06 .00 0 0 0 0 0 2022-03-01 14:42:05 14:52:05 2 .00 .00 0 0 0 0 0 2022-03-01 14:32:05 14:42:05 15 .13 .00 0 0 0 0 0 2022-03-01 14:22:05 14:32:05 0 .00 .00 0 0 0 0 0
Note:
循环调用dblink近150万次,产生了极小的undo和redo。 Top 指标为*via SQL*Net from/to dblink
循环使用dblink提交,undo频繁创建
SQL> alter database datafile '/u01/app/oracle/oradata/ANBOB19C/pdb1/undotbs01.dbf' AUTOEXTEND off; Database altered. create table s4 as select * from v$sesstat where sid=65; declare l_now date; begin loop select sysdate into l_now from dual@dl; -- close tx commit; end loop; end; create table s5 as select * from v$sesstat where sid=65; select s4.statistic#,sn.name,s4.value,s5.value, s5.value-s4.value diff, (s5.value-s4.value)/decode(s4.value,0,-1,s4.value) pct from s4,s5,v$statname sn where s4.statistic#=sn.statistic# and s4.sid=s5.sid and s4.statistic#=s5.statistic# and s5.value>s4.value order by 5 desc; STATISTIC# NAME VALUE VALUE DIFF PCT ---------- ---------------------------------------------------------------- ---------- ---------- ---------- ---------- 897 KTFB alloc space (block) 1.9574E+10 3.8997E+10 1.9423E+10 .992292652 171 logical read bytes from cache 1.5734E+10 2.7931E+10 1.2197E+10 .775161102 1882 bytes sent via SQL*Net to dblink 254715226 366200315 111485089 .437685217 1883 bytes received via SQL*Net from dblink 136773596 196636514 59862918 .437678907 313 redo size 81686552 124219108 42532556 .520680026 610 undo change vector size 19993072 31701152 11708080 .585606854 900 KTFB free space (block) 2324280 4695280 2371000 1.02010085 14 session logical reads 1920673 3409504 1488831 .775161102 160 db block gets from cache 1599499 2849606 1250107 .781561601 159 db block gets 1599499 2849606 1250107 .781561601 161 db block gets from cache (fastpath) 1372932 2518083 1145151 .834091565 45 enqueue requests 1451144 2490390 1039246 .716156357 47 enqueue releases 1451141 2490387 1039246 .716157837 1013 table scan disk non-IMC rows gotten 594567 1463631 869064 1.46167547 1010 table scan rows gotten 594567 1463631 869064 1.46167547 46 enqueue conversions 1639593 2506793 867200 .528911748 29 non-idle wait count 2444144 3225246 781102 .319581007 9 recursive calls 930465 1630012 699547 .751825163 1798 buffer is pinned count 699381 1367117 667736 .954752846 903 KTFB apply time (ms) 676874 1313069 636195 .939901666 178 db block changes 773223 1185137 411914 .532723419 1884 SQL*Net roundtrips to/from dblink 869768 1250454 380686 .437686831 898 KTFB alloc time (ms) 31125201 31458593 333392 .01071132 302 calls to kcmgcs 519496 821805 302309 .581927484 312 redo entries 537190 834531 297341 .553511793 306 calls to get snapshot scn: kcmgss 472295 713628 241333 .510979367 164 consistent gets from cache 321174 559898 238724 .743285571 163 consistent gets 321174 559898 238724 .743285571 4 opened cursors cumulative 463213 694941 231728 .500262298 1755 session cursor cache hits 463170 694895 231725 .500302265 166 consistent gets pin (fastpath) 103730 247282 143552 1.38390051 165 consistent gets pin 103731 247283 143552 1.38388717 1878 execute count 245776 382330 136554 .555603476 304 calls to kcmgas 226552 331413 104861 .46285621 234 free buffer inspected 217872 313289 95417 .437949805 230 free buffer requested 217604 312953 95349 .438176688 167 consistent gets examination 217443 312615 95172 .437687118 168 consistent gets examination (fastpath) 217443 312615 95172 .437687118 687 commit batch/immediate performed 217439 312610 95171 .437690571 685 commit batch requested 217439 312610 95171 .437690571 662 transaction rollbacks 217439 312610 95171 .437690571 684 commit batch/immediate requested 217439 312610 95171 .437690571 688 commit batch performed 217439 312610 95171 .437690571 231 dirty buffers inspected 214755 309301 94546 .440250518 57 physical read total bytes 5938921472 5939003392 81920 .000013794 177 physical read bytes 16384 98304 81920 5 67 cell physical IO interconnect bytes 5.1178E+10 5.1178E+10 81920 1.6007E-06 653 no work - consistent read gets 28506 69890 41384 1.45176454 1004 table scans (short tables) 28323 69707 41384 1.46114465 1015 table scan blocks gotten 28440 69824 41384 1.45513361 1873 parse count (total) 28341 69724 41383 1.46018136 677 drop segment calls in space pressure 13935 45773 31838 2.28475063 682 space was found by tune down 11512 38569 27057 2.35033009 681 steps of tune down ret. in space pressure 11512 38569 27057 2.35033009 680 tune down retentions in space pressure 11512 38569 27057 2.35033009 902 KTFB apply req 18865 38458 19593 1.03858998 901 KTFB free time (ms) 15431 27838 12407 .804030847 899 KTFB free req 9334 19131 9797 1.0496036 896 KTFB alloc req 9531 19327 9796 1.02780401 19 CPU used by this session 19324 28129 8805 .455651004 10 recursive cpu usage 18734 27242 8508 .454147539 28 non-idle wait time 43042 50203 7161 .16637238 35 messages sent 2819 4439 1620 .574671869 228 change write time 884 1366 482 .545248869 405 file io wait time 98 375 277 2.82653061 43 enqueue waits 158 265 107 .67721519 1872 parse time elapsed 78 173 95 1.21794872 261 write clones created in foreground 8 92 84 10.5 1871 parse time cpu 67 131 64 .955223881 698 local undo segment hints helped 81 140 59 .728395062 232 pinned buffers inspected 0 24 24 -24 172 physical reads 2 12 10 5 173 physical reads cache 2 12 10 5 176 physical read IO requests 2 12 10 5 52 physical read total IO requests 492200 492210 10 .000020317 676 auto extends on undo tablespace 10699 10706 7 .000654267 233 hot buffers moved to head of LRU 410 416 6 .014634146 275 shared hash latch upgrades - no wait 2 6 4 2 78 rows selected. SQL> l 1 select * from ( 2 select 3 begin_time, 4 to_char(end_time, 'HH24:MI:SS') end_time, 5 txncount, 6 undoblks * (select block_size from dba_tablespaces where upper(tablespace_name) = 7 (select upper(value) from v$parameter where name = 'undo_tablespace') 8 ) / 1048576 uds_MB , 9 undoblks * (select block_size from dba_tablespaces where upper(tablespace_name) = 10 (select upper(value) from v$parameter where name = 'undo_tablespace') 11 ) / ((end_time-begin_time) * 86400) / 1048576 uds_MB_sec , 12 maxquerylen uds_maxquerylen, 13 maxqueryid uds_maxqueryid, 14 ssolderrcnt uds_ssolderrcnt, 15 nospaceerrcnt uds_nospaceerrcnt, 16 unxpstealcnt uds_unxpstealcnt, 17 expstealcnt uds_expstealcnt 18 from 19 v$undostat 20 order by 21 begin_time desc 22* ) where rownum <= 30 MAX MAX ORA- SPC UNEXP EXP BEGIN_TIME END_TIME TXNCOUNT MB MB/s QRYLEN QRY_ID 1555 ERR STEAL STEAL ------------------- -------- ---------- ---------- ---------- ------- ------------- ----- ------ -------- -------- 2022-03-01 17:12:05 17:15:28 3 .00 .00 0 0 0 0 0 2022-03-01 17:02:05 17:12:05 0 .00 .00 0 0 0 0 0 2022-03-01 16:52:05 17:02:05 53784 293.07 .49 0 0 0 3172 11591 2022-03-01 16:42:05 16:52:05 292426 1737.87 2.90 0 0 0 4954 47402 2022-03-01 16:32:05 16:42:05 122635 952.98 1.59 0 0 0 0 281 2022-03-01 16:22:05 16:32:05 112743 678.39 1.13 0 0 0 1167 14951 2022-03-01 16:12:05 16:22:05 194688 1507.20 2.51 0 0 0 20 1087 2022-03-01 16:02:05 16:12:05 0 .00 .00 0 0 0 0 0 2022-03-01 15:52:05 16:02:05 15 .02 .00 0 0 0 0 0 2022-03-01 15:42:05 15:52:05 6 .00 .00 0 0 0 0 0 2022-03-01 15:32:05 15:42:05 17 .01 .00 0 0 0 0 0 2022-03-01 15:22:05 15:32:05 24 .01 .00 0 0 0 0 0 2022-03-01 15:12:05 15:22:05 9 .02 .00 0 0 0 0 0 2022-03-01 15:02:05 15:12:05 116 .05 .00 0 0 0 0 0 2022-03-01 14:52:05 15:02:05 101 .06 .00 0 0 0 0 0 2022-03-01 14:42:05 14:52:05 2 .00 .00 0 0 0 0 0 2022-03-01 14:32:05 14:42:05 15 .13 .00 0 0 0 0 0 2022-03-01 14:22:05 14:32:05 0 .00 .00 0 0 0 0 0 2022-03-01 14:12:05 14:22:05 1 .00 .00 0 0 0 0 0 2022-03-01 14:02:05 14:12:05 0 .00 .00 0 0 0 0 0 2022-03-01 13:52:05 14:02:05 2 .00 .00 0 0 0 0 0 2022-03-01 13:42:05 13:52:05 0 .00 .00 0 0 0 0 0 2022-03-01 13:32:05 13:42:05 4 .01 .00 0 0 0 0 0 2022-03-01 13:22:05 13:32:05 0 .00 .00 0 0 0 0 0 2022-03-01 13:12:05 13:22:05 1 .00 .00 0 0 0 0 0 2022-03-01 13:02:05 13:12:05 0 .00 .00 0 0 0 0 0 2022-03-01 12:52:05 13:02:05 5 .00 .00 0 0 0 0 0 2022-03-01 12:42:05 12:52:05 0 .00 .00 0 0 0 0 0 2022-03-01 12:32:05 12:42:05 4 .01 .00 0 0 0 0 0 2022-03-01 12:22:05 12:32:05 0 .00 .00 0 0 0 0 0 30 rows selected. SQL> select begin_time,(END_TIME-begin_time)*3600*24 secs,undoblks,txncount,MAXCONCURRENCY, EXPSTEALCNT,EXPBLKRELCNT,EXPBLKREUCNT,EXPIREDBLKS from v$undostat where begin_time>sysdate-1 order by begin_time; # Expired BEGIN_TIME SECS UNDOBLKS TXNCOUNT MAXCONCURRENCY Reused EXPBLKRELCNT EXPBLKREUCNT EXPIREDBLKS ------------------- ---------- ---------- ---------- -------------- ------------ ------------ ------------ ----------- ... 2022-03-01 15:32:05 600 1 17 1 0 0 0 0 2022-03-01 15:42:05 600 0 6 1 0 0 0 0 2022-03-01 15:52:05 600 2 15 2 0 0 0 0 2022-03-01 16:02:05 600 0 0 1 0 0 0 0 2022-03-01 16:12:05 600 192921 194688 3 1,087 61296 0 0 2022-03-01 16:22:05 600 86834 112743 2 14,951 2319648 0 0 2022-03-01 16:32:05 600 121981 122635 4 281 73672 0 0 2022-03-01 16:42:05 600 222447 292426 3 47,402 3629736 0 8592 2022-03-01 16:52:05 600 37513 53784 4 11,591 1052896 0 8592 2022-03-01 17:02:05 600 0 0 0 0 0 0 8592 2022-03-01 17:12:05 180 0 3 0 0 0 0 8464 SQL> show parameter undo PARAMETER_NAME TYPE VALUE ------------------------------------------------------------ ----------- ---------------------------------------------------------------------------------------------------- _undo_autotune boolean FALSE temp_undo_enabled boolean FALSE undo_management string AUTO undo_retention integer 900 undo_tablespace string UNDOTBS1
Note:
可见undo出现了undo segment STEAL,从未过期的undo段偷窃, 这里才执行了14万次,会话指标出现了较高的KTFB alloc space,等几个上次调用未出现的wait event, 且产生了大量的redo和UNDO, 我这仅仅只是select sysdate from dual@dblink 。
Summary:
因为使用dblink需要分配undo段来标示分布式事务,如果在循环中使用dblink并commit,每次会分配新的undo段,同时undo retention如果保留时间较常,那可能会导致undo自动扩展很大(autoextent on ), 或者会出现undo段争用,从未过期的undo段偷窃, 就会影响正常的DML事务。 记的前年的一次线上技术分享中我也描述过,分析性能问题常用的手段如top event, 统计信息指标,dump event,进程call stack, 如果以后有这类问题时通过上面的测试可以判断即使只有dblink 查询也可以找top undo占用多的会话,同时结合如dblink 和ktfb类指标去检查是否大量使用了dblink。
— enjoy —
对不起,这篇文章暂时关闭评论。