Performance tuning ‘wait for a undo record’ event
前几日一个数据库的负载非常高,查看数据库的活动会话发现大部分session在等待’wait for a undo record’的事件, 该事件通常发生在fast-start parallel rollback, 该特性通常是在SMON进程发现存在一个长时间的事务需要回滚,或遇到PARALLE DML需要回滚时,超过一定量的回滚任务时自动启动多个server process的并行回滚,有些情况下因为同一个回滚任务PQ进程需要使用相同的资源,所以在多进程并行协调过程中性能变的更糟糕,没有串行回滚做的快。
这种情况时有几个并发症需要注意:
注意负载高,cpu使用导致数据库hang
因为并行的原因注意session数满
注意UNDO的增升,undo无法扩展的错误(虽然undo理论可以循环使用,但是按照undo block SEQ#值向后滚动,如果其中有个extent含未活动事务,seq#前的EXTENT即使过了保留期也无法被利用,随然有undo extent剽窃的现象,但问题也可能出现,所以ROLLBACK越快越好)
SQL> col bs for a10
SQL> select ses.username, ses.sid, ses.event, ses.machine, ses.program, ses.status,
ses.last_call_et, sql.hash_value, ses.sql_id,wait_time||':'||SECONDS_IN_WAIT wai_secinwait ,
blocking_instance||':'||blocking_session bs,substr(sql.sql_text,1,30) sqltext,sql_child_number ch#
from v$session ses left join v$sql sql
on ses.sql_hash_value = sql.hash_value and
ses.sql_child_number=sql.child_number
where ses.type ='USER' and ses.status'INACTIVE' -- and sql_text like 'select t.subsid,s.servnumber,t%'
order by SECONDS_IN_WAIT,last_call_et,4;
USERNAME SID EVENT MACHINE PROGRAM STATUS LAST_CALL_ET HASH_VALUE SQL_ID WAI_SECINW BS SQLTEXT CH#
---------- ---------- -------------------- ---------- -------------- -------- ------------ ---------- -------------------- ---------- ---------- ---------- ----------
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
db file sequential r anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
...
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
wait for a undo reco anbob1 oracle@anbob1 ACTIVE 9 0:0 : 0
rows selected.
Note:
这里我已经截取了部分输出,有100左右的会话在wait for a undo recode事件,查看回滚信息
SQL>
CURRENT_TIME
-----------------
1102 05:02:13
SQL> select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo",
decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) "Estimated time to complete"
from v$fast_start_transactions;
USN STATE Total Done ToDo Estimated time to
---------- ---------------- ---------- ---------- ---------- -----------------
RECOVERING 213219 1010 212209 20151102 09:18:55
RECOVERED 2574 2574 0 20151102 05:03:17
RECOVERED 2682 2682 0 20151102 05:03:17
RECOVERED 4755 4755 0 20151102 05:03:17
RECOVERED 323 323 0 20151102 05:03:17
RECOVERED 128 128 0 20151102 05:03:17
RECOVERED 2185 2185 0 20151102 05:03:17
RECOVERED 21 21 0 20151102 05:03:17
rows selected.
SQL> /
USN STATE Total Done ToDo Estimated time to
---------- ---------------- ---------- ---------- ---------- -----------------
RECOVERING 213219 1010 212209 20151102 09:29:28
RECOVERED 2574 2574 0 20151102 05:03:20
RECOVERED 2682 2682 0 20151102 05:03:20
RECOVERED 4755 4755 0 20151102 05:03:20
RECOVERED 323 323 0 20151102 05:03:20
RECOVERED 128 128 0 20151102 05:03:20
RECOVERED 2185 2185 0 20151102 05:03:20
RECOVERED 21 21 0 20151102 05:03:20
Note:
确实有一个事务在回滚,根据估算这么多并发进程也要4个多小时完成, 因为当时半夜时分数据库的业务较少,加大了paralle 试试,修改实例参数fast_start_parallel_rollback=HIGH, 发现速度也并未提高,解决这个事件我们可以禁用parallel rollback,通过动态的修改fast_start_parallel_rollback参数.
SQL> alter system set fast_start_parallel_rollback = false;
SQL> select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo",
decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) "Estimated time to complete"
from v$fast_start_transactions;
USN STATE Total Done ToDo Estimated time to
---------- ---------------- ---------- ---------- ---------- -----------------
RECOVERED 2574 2574 0 20151102 05:04:06
RECOVERED 2682 2682 0 20151102 05:04:06
RECOVERED 4755 4755 0 20151102 05:04:06
RECOVERED 323 323 0 20151102 05:04:06
RECOVERED 128 128 0 20151102 05:04:06
RECOVERED 2185 2185 0 20151102 05:04:06
RECOVERED 21 21 0 20151102 05:04:06
rows selected.
SQL>
CURRENT_TIME
-----------------
1102 05:04:08
SQL> select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo",
decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) "Estimated time to complete"
from v$fast_start_transactions;
USN STATE Total Done ToDo Estimated time to
---------- ---------------- ---------- ---------- ---------- -----------------
RECOVERED 2574 2574 0 20151102 05:08:40
RECOVERED 2682 2682 0 20151102 05:08:40
RECOVERED 4755 4755 0 20151102 05:08:40
RECOVERING 211330 2814 208516 20151102 10:23:35
RECOVERED 323 323 0 20151102 05:08:40
RECOVERED 128 128 0 20151102 05:08:40
RECOVERED 2185 2185 0 20151102 05:08:40
RECOVERED 21 21 0 20151102 05:08:40
rows selected.
Note:
修改fast_start_parallel_rollback参数禁用paralle 回滚后,回滚任务重启,并且并行的会话消失,‘wait for a undo recode’ 事件消失,串行的恢复速度估算是比比并行慢了些(但只是估算), 细水长流最后回滚完成。注意不建议在大量活动会话DML多时的较忙的实例动态修改该参数。
对不起,这篇文章暂时关闭评论。