Troubleshooting High Private/Interconnect Network Latency Case
系统资源的最大利用是系统优化的目标,但是过度”优化”系统的容错性就相对较差,如果出现丝毫的卡顿就会“雪崩”,前段时间分析的一个案例,号称是亚洲最好的一套HP小机配置,有Oracle原厂操刀调试的一套Oracle 12c R1环境,不过确实效率很高,突然有一天因RAC 节点之间的private network 高latency导致频繁的 gc 相关等待,从而导致性能问题。
问题前半小时node2 AWR
Per Second | Per Transaction | Per Exec | Per Call | |
---|---|---|---|---|
DB Time(s): | 380.7 | 0.2 | 0.00 | 0.00 |
DB CPU(s): | 68.2 | 0.0 | 0.00 | 0.00 |
Background CPU(s): | 8.6 | 0.0 | 0.00 | 0.00 |
Redo size (bytes): | 39,391,148.6 | 16,534.7 | ||
Logical read (blocks): | 2,776,413.2 | 1,165.4 | ||
Block changes: | 178,201.7 | 74.8 | ||
Physical read (blocks): | 29,338.7 | 12.3 | ||
Physical write (blocks): | 6,488.4 | 2.7 | ||
Read IO requests: | 18,763.1 | 7.9 | ||
Write IO requests: | 2,981.2 | 1.3 | ||
Read IO (MB): | 229.2 | 0.1 | ||
Write IO (MB): | 50.7 | 0.0 | ||
IM scan rows: | 0.0 | 0.0 | ||
Session Logical Read IM: | ||||
Global Cache blocks received: | 2,144.3 | 0.9 | ||
Global Cache blocks served: | 3,382.7 | 1.4 | ||
User calls: | 112,248.9 | 47.1 | ||
Parses (SQL): | 4,135.5 | 1.7 | ||
Hard parses (SQL): | 17.6 | 0.0 | ||
SQL Work Area (MB): | 36.1 | 0.0 | ||
Logons: | 15.4 | 0.0 | ||
Executes (SQL): | 93,517.4 | 39.3 | ||
Rollbacks: | 2.0 | 0.0 | ||
Transactions: | 2,382.3 |
Instance Efficiency Percentages (Target 100%)
Buffer Nowait %: | 99.92 | Redo NoWait %: | 100.00 |
Buffer Hit %: | 98.98 | In-memory Sort %: | 100.00 |
Library Hit %: | 100.26 | Soft Parse %: | 99.57 |
Execute to Parse %: | 95.58 | Latch Hit %: | 99.77 |
Parse CPU to Parse Elapsd %: | 54.18 | % Non-Parse CPU: | 99.66 |
Flash Cache Hit %: | 0.00 |
Top 10 Foreground Events by Total Wait Time
Event | Waits | Total Wait Time (sec) | Wait Avg(ms) | % DB time | Wait Class |
---|---|---|---|---|---|
DB CPU | 60.1K | 17.9 | |||
db file sequential read | 10,859,769 | 13.7K | 1.27 | 4.1 | User I/O |
log file sync | 2,211,105 | 13.4K | 6.06 | 4.0 | Commit |
latch: cache buffers chains | 379,825 | 3994.9 | 10.52 | 1.2 | Concurrency |
gc cr grant 2-way | 4,700,313 | 3993.6 | 0.85 | 1.2 | Cluster |
enq: TX – row lock contention | 732 | 1991.3 | 2720.37 | .6 | Application |
latch free | 378,211 | 1389.8 | 3.67 | .4 | Other |
db file parallel read | 534,612 | 1239.2 | 2.32 | .4 | User I/O |
gc buffer busy acquire | 308,207 | 1225.6 | 3.98 | .4 | Cluster |
gc current grant 2-way | 959,084 | 851.4 | 0.89 | .3 | Cluster |
15分钟后node2 AWR
Load Profile
Per Second | Per Transaction | Per Exec | Per Call | |
---|---|---|---|---|
DB Time(s): | 2,366.8 | 1.8 | 0.05 | 0.04 |
DB CPU(s): | 48.4 | 0.0 | 0.00 | 0.00 |
Background CPU(s): | 7.3 | 0.0 | 0.00 | 0.00 |
Redo size (bytes): | 17,262,623.5 | 12,978.6 | ||
Logical read (blocks): | 1,821,514.9 | 1,369.5 | ||
Block changes: | 80,043.7 | 60.2 | ||
Physical read (blocks): | 23,230.6 | 17.5 | ||
Physical write (blocks): | 4,214.2 | 3.2 | ||
Read IO requests: | 15,238.3 | 11.5 | ||
Write IO requests: | 2,345.5 | 1.8 | ||
Read IO (MB): | 181.5 | 0.1 | ||
Write IO (MB): | 32.9 | 0.0 | ||
IM scan rows: | 0.0 | 0.0 | ||
Session Logical Read IM: | ||||
Global Cache blocks received: | 6,799.8 | 5.1 | ||
Global Cache blocks served: | 7,779.9 | 5.9 | ||
User calls: | 62,306.9 | 46.8 | ||
Parses (SQL): | 2,077.9 | 1.6 | ||
Hard parses (SQL): | 22.2 | 0.0 | ||
SQL Work Area (MB): | 19.1 | 0.0 | ||
Logons: | 12.6 | 0.0 | ||
Executes (SQL): | 43,196.6 | 32.5 | ||
Rollbacks: | 3.1 | 0.0 | ||
Transactions: | 1,330.1 |
Instance Efficiency Percentages (Target 100%)
Buffer Nowait %: | 99.66 | Redo NoWait %: | 100.00 |
Buffer Hit %: | 98.72 | In-memory Sort %: | 100.00 |
Library Hit %: | 100.14 | Soft Parse %: | 98.93 |
Execute to Parse %: | 95.19 | Latch Hit %: | 105.08 |
Parse CPU to Parse Elapsd %: | 70.98 | % Non-Parse CPU: | 99.62 |
Flash Cache Hit %: | 0.00 |
Top 10 Foreground Events by Total Wait Time
Event | Waits | Total Wait Time (sec) | Wait Avg(ms) | % DB time | Wait Class |
---|---|---|---|---|---|
latch: row cache objects | 5,655,292 | 477K | 84.35 | 21.7 | Concurrency |
enq: TX – index contention | 305,813 | 376.4K | 1230.67 | 17.1 | Concurrency |
gc cr grant 2-way | 4,342,796 | 211K | 48.59 | 9.6 | Cluster |
buffer busy waits | 723,245 | 202.2K | 279.60 | 9.2 | Concurrency |
gc buffer busy release | 364,833 | 198.8K | 544.96 | 9.0 | Cluster |
gc buffer busy acquire | 2,666,894 | 137K | 51.36 | 6.2 | Cluster |
gc cr multi block request | 625,425 | 55K | 87.91 | 2.5 | Cluster |
DB CPU | 45K | 2.0 | |||
gc current grant 2-way | 812,849 | 40.1K | 49.34 | 1.8 | Cluster |
gc current grant busy | 459,003 | 28.1K | 61.16 | 1.3 | Cluster |
Host CPU
CPUs | Cores | Sockets | Load Average Begin | Load Average End | %User | %System | %WIO | %Idle |
---|---|---|---|---|---|---|---|---|
256 | 128 | 16 | 0.40 | 0.33 | 28.0 | 5.0 | 0.6 | 67.1 |
Instance CPU
%Total CPU | %Busy CPU | %DB time waiting for CPU (Resource Manager) |
---|---|---|
23.4 | 71.0 | 0.0 |
看问题时刻另一节点node1的AWR TOP EVENT
Top 10 Foreground Events by Total Wait Time
Event | Waits | Total Wait Time (sec) | Wait Avg(ms) | % DB time | Wait Class |
---|---|---|---|---|---|
gc buffer busy acquire | 1,940,939 | 106.1K | 54.67 | 16.7 | Cluster |
gc cr block 2-way | 4,050,009 | 92.3K | 22.78 | 14.5 | Cluster |
gc cr grant 2-way | 3,445,569 | 70.5K | 20.46 | 11.1 | Cluster |
enq: TX – index contention | 48,310 | 59K | 1222.08 | 9.3 | Concurrency |
DB CPU | 38.4K | 6.0 | |||
gc current block 2-way | 1,636,339 | 35K | 21.40 | 5.5 | Cluster |
gc buffer busy release | 58,210 | 32.9K | 565.78 | 5.2 | Cluster |
gc cr multi block request | 822,890 | 32.6K | 39.62 | 5.1 | Cluster |
gc cr grant congested | 215,405 | 27.8K | 129.05 | 4.4 | Cluster |
latch: row cache objects | 6,116,171 | 23.5K | 3.84 | 3.7 | Concurrency |
Host CPU
CPUs | Cores | Sockets | Load Average Begin | Load Average End | %User | %System | %WIO | %Idle |
---|---|---|---|---|---|---|---|---|
256 | 128 | 16 | 0.26 | 0.25 | 21.3 | 4.2 | 0.1 | 74.6 |
Instance CPU
%Total CPU | %Busy CPU | %DB time waiting for CPU (Resource Manager) |
---|---|---|
18.8 | 73.8 | 0.0 |
当时互相ping私网已经显示出不正常的network latency,正常应该是小于1ms. 两个节点的CPU看使用率并不高,当然需要看osw更精确的时间点。
分析方法
1, 检查两节点系统资源利用率,如是否因为CPU使用率过导致lms进程慢
2,AWRDD对比
3,检查网络延迟和瓶颈
$ netstat -s $ ss -s $ sar -n DEV 1 5 $ sar -n TCP 1 5 # ip link show – check network interface status (up /down) # ethtool eth0 – ethtool is used for querying settings of an ethernet device like network speed, packet drop info etc other tools iftop/nload speedometer nethogs tcptrack -i eth0
4, ASH和OSW分钟或秒级分析查找问题最初的表现
5, 应用跨实例部署
等等
看RAC 部分AWR
NODE1
RAC Statistics
Begin | End | |
---|---|---|
Number of Instances: | 2 | 2 |
Global Cache Load Profile
Per Second | Per Transaction | |
---|---|---|
Global Cache blocks received: | 8,087.67 | 11.81 |
Global Cache blocks served: | 6,770.28 | 9.89 |
GCS/GES messages received: | 53,558.17 | 78.23 |
GCS/GES messages sent: | 62,283.68 | 90.98 |
DBWR Fusion writes: | 272.70 | 0.40 |
Estd Interconnect traffic (KB) | 141,488.97 |
Global Cache and Enqueue Services – Workload Characteristics
Avg global enqueue get time (ms): | 11.5 |
Avg global cache cr block receive time (ms): | 23.8 |
Avg global cache current block receive time (ms): | 25.7 |
Avg global cache cr block build time (ms): | 0.0 |
Avg global cache cr block send time (ms): | 0.0 |
Global cache log flushes for cr blocks served %: | 20.4 |
Avg global cache cr block flush time (ms): | 2.3 |
Avg global cache current block pin time (ms): | 1.2 |
Avg global cache current block send time (ms): | 0.0 |
Global cache log flushes for current blocks served %: | 2.2 |
Avg global cache current block flush time (ms): | 6.8 |
Global Cache and Enqueue Services – Messaging Statistics
Avg message sent queue time (ms): | 0.2 |
Avg message sent queue time on ksxp (ms): | 36.0 |
Avg message received queue time (ms): | 0.0 |
Avg GCS message process time (ms): | 0.0 |
Avg GES message process time (ms): | 0.0 |
% of direct sent messages: | 35.89 |
% of indirect sent messages: | 63.86 |
% of flow controlled messages: | 0.25 |
Interconnect Ping Latency Stats
Target Instance | 500B Ping Count | Avg Latency 500B msg | Stddev 500B msg | 8K Ping Count | Avg Latency 8K msg | Stddev 8K msg |
---|---|---|---|---|---|---|
1 | 71 | 0.20 | 0.06 | 71 | 0.20 | 0.06 |
2 | 71 | 4.60 | 7.01 | 71 | 4.71 | 7.03 |
Node2
RAC Statistics
Begin | End | |
---|---|---|
Number of Instances: | 2 | 2 |
Global Cache Load Profile
Per Second | Per Transaction | |
---|---|---|
Global Cache blocks received: | 6,799.76 | 5.11 |
Global Cache blocks served: | 7,779.91 | 5.85 |
GCS/GES messages received: | 57,138.25 | 42.96 |
GCS/GES messages sent: | 51,467.70 | 38.70 |
DBWR Fusion writes: | 563.76 | 0.42 |
Estd Interconnect traffic (KB) | 137,849.40 |
Global Cache and Enqueue Services – Workload Characteristics
Avg global enqueue get time (ms): | 53.1 |
Avg global cache cr block receive time (ms): | 4.3 |
Avg global cache current block receive time (ms): | 7.2 |
Avg global cache cr block build time (ms): | 0.0 |
Avg global cache cr block send time (ms): | 0.0 |
Global cache log flushes for cr blocks served %: | 11.7 |
Avg global cache cr block flush time (ms): | 5.0 |
Avg global cache current block pin time (ms): | 3.8 |
Avg global cache current block send time (ms): | 0.0 |
Global cache log flushes for current blocks served %: | 2.1 |
Avg global cache current block flush time (ms): | 14.7 |
Global Cache and Enqueue Services – Messaging Statistics
Avg message sent queue time (ms): | 8.3 |
Avg message sent queue time on ksxp (ms): | 9.8 |
Avg message received queue time (ms): | 0.2 |
Avg GCS message process time (ms): | 0.1 |
Avg GES message process time (ms): | 0.0 |
% of direct sent messages: | 44.90 |
% of indirect sent messages: | 54.87 |
% of flow controlled messages: | 0.24 |
这个问题和几年前Troubleshooting high “gc current grant 2-way” and “gc cr grant 2-way” 案例(二) 有些相似之处,Estd Interconnect traffic (KB) 我们可以看到几乎达到了千兆的上限141,488KB/s。
ksxp 层决定 IPC 协议如何执行的可以查询x$ksxpclient,dba_hist_ic_client_stats 视图显示了相同的信息,并且该视图确实收集了所有实例的统计信息。
select name, trunc(bytes_sent/1024/1024) as mb_sent, trunc(bytes_rcv/1024/1024) as mb_recv, trunc((bytes_sent+bytes_rcv)/1024/1024) as mb_total, to_char(round(100*(((bytes_sent+bytes_rcv)/1024/1024)/ sum((bytes_sent+bytes_rcv)/1024/1024) over ()),2),'990.00') as total_perc from sys.x$ksxpclient order by total_perc desc;
Poorly sized UDP receive (rx) buffer sizes / UDP buffer socket overflows
‘netstat -s’ or ‘netstat -su’ and look for either “udpInOverflowsudpInOverflows”, “packet receive errors”, “fragments dropped” or “outgoing packet drop” depending on the platform.
If there is a significant increases in “outgoing packets dropped” in the TCP section of netstat -s output on the nodes remote to where the workload is running, increase the wmem_default and wmem_max both to 4MB (Linux) could resolve the issue.
Poor interconnect performance and high cpu utilization. `netstat -s` reports packet reassembly failures
Large UDP datagrams may be fragmented and sent in multiple frames based on Maximum Transmission Unit (MTU) size.
`netstat -s` reports a large number of Internet Protocol (IP) “reassembles failed” and “fragments dropped after timeout” in the “IP Statistics” section of the output on the receiving node.
`netstat -s` IP stat counters:
3104582 fragments dropped after timeout
34550600 reassemblies required
8961342 packets reassembled ok
3104582 packet reassembles failed.
LINUX 6 netstat -s |grep reass -sleep 5sec netstat -s |grep reass LINUX 7 nstat -az|grep IpReasmFails -sleep 5sec nstat -az|grep IpReasmFails
Action: Increase fragment reassembly buffers, allocating more space for reassembly. Increase the time to reassemble packet fragment., increase udp receive buffers to accommodate network processing latencies that aggravate reassembly failures and identify CPU utilization that negatively impacts network stack processing. Note, increase the following settings will also increase the memory usage.
On LINUX:
To modify reassembly buffer space, change the following thresholds:
/proc/sys/net/ipv4/ipfrag_low_thresh (default = 196608) /proc/sys/net/ipv4/ipfrag_high_thresh (default = 262144)
To modify packet fragment reassembly times, modify:
/proc/sys/net/ipv4/ipfrag_time (default = 30)
ADDM report
Finding 2: Interconnect Latency
Impact is 421.58 active sessions, 61.59% of total activity.
———————————————————–
Higher than expected latency of the cluster interconnect was responsible for significant database time on this instance.
The instance was consuming 414762 kilo bits per second of interconnect bandwidth.
65% of this interconnect bandwidth was used for global cache messaging, 0% for parallel query messaging and 33% for database lock management.
The average latency for 8K interconnect messages was 4708 microseconds. The instance is using the private interconnect device “lan901:801” with IP address 169.254.141.243 and source “33”.
Finding 5: Global Cache Congestion
Impact is 80.49 active sessions, 11.76% of total activity.
———————————————————-
Global Cache Service Processes (LMSn) in other instances were not processing requests fast enough.
Recommendation 1: Database Configuration Estimated benefit is 80.49 active sessions, 11.76% of total activity.
———————————————————————
Action
Increase throughput of the Global Cache Service (LMSn) processes.
Increase the number of Global Cache Service processes by increasing the value of the parameter “gcs_server_processes”. Alternatively, if the
host is CPU bound consider increasing the OS priority of the Global Cache Service processes.
Rationale
The value of parameter “gcs_server_processes” was “20” during theanalysis period.
简单记录一下问题现象, 这个case后来看是因为应用因为连接数满自动failover到了其它实例,跨实例的定时调度大量insert,加剧GC,当然期中还引起了latch undo争用和看到index root split, 本篇仅记录一次interconnect network网络带宽耗尽后的gc现象。
对不起,这篇文章暂时关闭评论。