首页 » ORACLE 9i-23ai » Troubleshooting Oracle RAC high active sessions wait log file sync &buffer busy waits & gc xx events due to Interconnect latency

Troubleshooting Oracle RAC high active sessions wait log file sync &buffer busy waits & gc xx events due to Interconnect latency

最近有家半导体客户的oracle RAC间隔性出现了2次活动会话高负载的堵塞现象,找到我协助分析, 当时的主要活动会等待是buffer busy wait, enq: tx … 的hot object现象,同时有较高的log file sync 单次延时,当分析数据库性能问题时,从top event入手没错,但是也要防止被带偏,也可能是因为其它的影响,简单记录这个事件。

AWR profile

Per Second Per Transaction Per Exec Per Call
DB Time(s): 215.8 1.1 0.08 0.02
DB CPU(s): 4.1 0.0 0.00 0.00
Redo size (bytes): 3,217,300.8 15,586.6
Logical read (blocks): 219,035.8 1,061.2
Block changes: 12,901.5 62.5
Physical read (blocks): 6,808.4 33.0
Physical write (blocks): 679.3 3.3
Read IO requests: 97.5 0.5
Write IO requests: 116.5 0.6
Read IO (MB): 53.2 0.3
Write IO (MB): 5.3 0.0
Global Cache blocks received: 6.8 0.0
Global Cache blocks served: 211.6 1.0
User calls: 10,975.3 53.2
Parses (SQL): 4,106.6 19.9
Hard parses (SQL): 0.4 0.0
SQL Work Area (MB): 26.3 0.1
Logons: 4.9 0.0
Executes (SQL): 2,758.7 13.4
Rollbacks: 12.0 0.1
Transactions: 206.4

 

Top 10 Foreground Events by Total Wait Time

Event Waits Total Wait Time (sec) Wait Avg(ms) % DB time Wait Class
buffer busy waits 283,804 212.4K 748 27.2 Concurrency
enq: TX – row lock contention 377,781 177K 469 22.7 Application
log file sync 740,503 103.5K 140 13.3 Commit
enq: TX – index contention 176,726 84.6K 479 10.9 Concurrency
rdbms ipc reply 198,695 81.7K 411 10.5 Other
enq: TA – contention 1,260 23.3K 18505 3.0 Other
enq: US – contention 3,420 15.1K 4421 1.9 Other
DB CPU 14.9K 1.9
KJC: Wait for msg sends to complete 5,491 10.1K 1848 1.3 Other
DFS lock handle 21,161 9986.8 472 1.3 Other

Note:

主要的wait是buffer busy热块或enq : tx。。行锁, 及log file sync, 看上去像是高并发,但并不一定,注意log file sync wait avg 140ms, 上面的事务并没有多大量级。 想象一下 commit时会做哪个操作? 持久化或广播不是吗?

Background Wait Events

Event Waits %Time -outs Total Wait Time (s) Avg wait (ms) Waits /txn % bg time
DFS lock handle 20,163 100 1,349 67 0.03 30.20
log file parallel write 694,521 0 590 1 0.93 13.20
wait for scn ack 727 0 441 607 0.00 9.88
gcs log flush sync 35,982 10 182 5 0.05 4.07
buffer busy waits 86 0 100 1164 0.00 2.24
enq: CO – master slave det 1,053 100 90 85 0.00 2.01

Note:

log file parallel write正常范围,但scn ask 异常高。

DFS lock handle

SQL> select instance_number,p1,count(*) from anbob.ASH_DBA20240802  where sample_time between to_date('20240802 15:20','yyyymmdd hh24:mi') and to_date('20240802 15:50','yyyymmdd hh24:mi')
and event='DFS lock handle' group by instance_number,p1  2  ;

INSTANCE_NUMBER         P1   COUNT(*)
--------------- ---------- ----------
              2 1111621637          3
              2 1146617861          4
              2 1413545989          3
              1 1111621637        127
              1 1413545989        794
              1 1146617861        389

SQL> select chr(bitand(&p1,-16777216)/16777215) ||
  chr(bitand(&p1,16711680)/65535) type,
  mod(&p1, 16) md
  from dual
  ;  2    3    4    5
Enter value for p1: 1413545989

TYPE         MD
---- ----------
TA            5

SQL> /
Enter value for p1: 1146617861

TYPE         MD
---- ----------
DX            5

SQL> select * from v$lock_type where type='DX';

TYPE       NAME                           ID1_TAG    ID2_TAG    IS_USE DESCRIPTION
---------- ------------------------------ ---------- ---------- ------ ----------------------------------------
DX         Distributed Transaction        transactio 0          NO     Serializes tightly coupled distributed t
                                          n entry #                    ransaction branches

SQL> select * from v$lock_type where type='TA'
  2  ;

TYPE       NAME                           ID1_TAG    ID2_TAG    IS_USE DESCRIPTION
---------- ------------------------------ ---------- ---------- ------ ----------------------------------------
TA         Instance Undo                  operation  undo segme NO     Serializes operations on undo segments a
                                                     nt # / oth        nd undo tablespaces
                                                     er

ASH wait chain

SQL> @mydash program2||event2 session_type='FOREGROUND' "to_date('20240802 15:28','yyyymmdd hh24:mi')" "to_date('20240802 15:36','yyyymmdd hh24:mi')"

-- Display ASH Wait Chain Signatures script v0.6 BETA by Tanel Poder ( https://tanelpoder.com )

%This     SECONDS        AAS #Blkrs WAIT_CHAIN                                                                                                                                                                                                                                                                                                   FIRST_SEEN                             LAST_SEEN
------ ---------- ---------- ------ ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ -------------------------------------- --------------------------------------
  10%       39960       83.3      4 -> (pro1) rdbms ipc reply  -> (GTXn) DFS lock handle                                                                                                                                                                                                                                      2024-08-02 15:29:21                    2024-08-02 15:35:24
   9%       39410       82.1     36 -> (pro2) buffer busy waits [data block]  -> (pro2) gc current grant 2-way                                                                                                                                                                                             2024-08-02 15:29:31                    2024-08-02 15:35:24
   4%       17430       36.3     27 -> (pro2) enq: TX - row lock contention [mode=6]  -> (pro2) buffer busy waits [data block]  -> (pro2) gc current grant 2-way                                                                                                                        2024-08-02 15:29:31                    2024-08-02 15:35:24
   4%       15260       31.8     17 -> (pro3) enq: TX - row lock contention [mode=6]  -> (pro3) gc current grant 2-way                                                                                                                                                                                     2024-08-02 15:29:31                    2024-08-02 15:35:24
   3%       11770       24.5      1 -> (pro3) log file sync  -> (LGWR) wait for scn ack                                                                                                                                                                                                                                       2024-08-02 15:30:34                    2024-08-02 15:35:24
   3%       11170       23.3      1 -> (pro2) log file sync  -> (LGWR) wait for scn ack                                                                                                                                                                                                                                       2024-08-02 15:30:34                    2024-08-02 15:35:24
   2%        7430       15.5      1 -> (pro2) log file sync  -> (LGWR) wait for scn ack  -> [idle blocker 2,2458,1]                                                                                                                                                                                                           2024-08-02 15:29:21                    2024-08-02 15:30:23
   1%        6150       12.8      5 -> (pro2) enq: TX - index contention [mode=4]  -> (pro2) enq: FB - contention [mode=6]                                                                                                                                                                                 2024-08-02 15:30:12                    2024-08-02 15:34:21
   1%        5240       10.9      4 -> (pro3) enq: TX - row lock contention [mode=6]  -> (pro3) buffer busy waits [data block]  -> (pro3) gc current grant 2-way                                                                                                                        2024-08-02 15:30:12                    2024-08-02 15:33:07
   1%        5220       10.9      1 -> (pro3) log file sync  -> (LGWR) wait for scn ack  -> [idle blocker 2,2458,1]                                                                                                                                                                                                           2024-08-02 15:29:21                    2024-08-02 15:30:23
   1%        4340          9     42 -> (extract) control file sequential read                                                                                                                                                                                                                                                                    2024-08-02 15:29:12                    2024-08-02 15:35:59
   1%        4320          9      3 -> (pro2) enq: TX - index contention [mode=4]  -> (pro2) buffer busy waits [data block]  -> (pro2) buffer busy waits [data block]  -> (pro2) gc current grant 2-way

 

Other Instance Activity Stats

Statistic Total per Second per Trans
redo write broadcast ack count 55,254 15.29 0.07
redo write broadcast ack time 2,555,084,270 707,047.46 3,425.38
redo write broadcast lgwr post count 33 0.01 0.00

Host CPU

CPUs Cores Sockets Load Average Begin Load Average End %User %System %WIO %Idle
32 32 4 0.14 0.15 13.3 2.6 0.0 84.1

 

Global Cache Load Profile

Per Second Per Transaction
Global Cache blocks received: 6.81 0.03
Global Cache blocks served: 211.59 1.03
GCS/GES messages received: 498.81 2.42
GCS/GES messages sent: 487.26 2.36
DBWR Fusion writes: 7.49 0.04
Estd Interconnect traffic (KB) 1,939.76

Global Cache and Enqueue Services – Workload Characteristics

Avg global enqueue get time (ms): 49.2
Avg global cache cr block receive time (ms): 551.0
Avg global cache current block receive time (ms): 157.4
Avg global cache cr block build time (ms): 0.0
Avg global cache cr block send time (ms): 0.0
Global cache log flushes for cr blocks served %: 2.6
Avg global cache cr block flush time (ms): 15.8
Avg global cache current block pin time (ms): 0.2
Avg global cache current block send time (ms): 0.0
Global cache log flushes for current blocks served %: 0.0
Avg global cache current block flush time (ms): 30.6

 

Global Cache and Enqueue Services – Messaging Statistics

Avg message sent queue time (ms): 0.1
Avg message sent queue time on ksxp (ms): 91.1
Avg message received queue time (ms): 0.0
Avg GCS message process time (ms): 0.0
Avg GES message process time (ms): 0.0
% of direct sent messages: 67.04
% of indirect sent messages: 31.78
% of flow controlled messages: 1.17

 

Global Messaging Statistics

Statistic Total per Second per Trans
acks for commit broadcast(actual) 43,823 12.13 0.06
acks for commit broadcast(logical) 48,723 13.48 0.07
broadcast msgs on commit(actual) 662,139 183.23 0.89
broadcast msgs on commit(logical) 688,529 190.53 0.92
broadcast msgs on commit(wasted) 21,462 5.94 0.03
dynamically allocated gcs resources 0 0.00 0.00

Interconnect Device Statistics

Device Name IP Address Public Source Send Mbytes/sec Send Errors Send Dropped Send Buffer Overrun Send Carrier Lost Receive Mbytes/sec Receive Errors Receive Dropped Receive Buffer Overrun Receive Frame Errors
lan901:801 169.254.118.232 NO 0.00 0 0 0 0 0.00 0 0 0 0

 Interconnect Ping Latency Stats  –问题时间点

Target Instance 500B Ping Count Avg Latency 500B msg Stddev 500B msg 8K Ping Count Avg Latency 8K msg Stddev 8K msg
1 288 0.15 0.04 288 0.14 0.04
2 287 456.64 1,664.70 287 458.05 1,666.25

 

 

Interconnect Ping Latency Stats –其他正常时间点

Target Instance 500B Ping Count Avg Latency 500B msg Stddev 500B msg 8K Ping Count Avg Latency 8K msg Stddev 8K msg
1 285 0.15 0.04 285 0.14 0.04
2 285 0.27 0.23 285 0.39 0.26

小结:
TOP event 基本是被跨实例调用相关的等待,如DFS LOCK和GC ,而log file sync是等scn ask也是私网广播。而RAC的统计信息发现网络延迟过高,是RAC无法接受的,因为该环境无OSW, 但目前的数据基本怀疑是interconnect 延时过高导致。

 

如果您有相关问题 ,联系www.anbob.com 首页的联系方式

打赏

, ,

对不起,这篇文章暂时关闭评论。