ANBOB™

专业的Oracle及国产数据库选型咨询、故障诊断、性能优化、远程维保、异常恢复、安装部署、升级迁移等服务, QQ:85304522 微信/Tel:(+86)134-365-60330

首页 » ORACLE 9i-23ai » Oracle dataguard redo传输网络带宽不足哪些表现？

Oracle dataguard redo传输网络带宽不足哪些表现？

2024/06/23
ORACLE 9i-23ai
168 views
Oracle dataguard redo传输网络带宽不足哪些表现？已关闭评论

Oracle dataguard redo传输网络带宽不足通常如standby同步时延增加，数据库层还会有哪些表现吗？当standby 同步延时较大故障时，故障排除最重要的阶段是正确识别问题的性质，例如能够判断问题是否与网络本身、DataGuard、Oracle 数据库（主数据库或备用数据库）或其他数据库有关。尽管 Oracle 提供了非常强大的工具，但这并不是一件容易的事。但即使在识别出网络问题之后，DBA 也不一定就此止步。您可能认为此时您可以将问题传递给网络管理员并等待问题得到解决，但情况并非总是如此。网络问题可能与一系列不同的问题混合在一起。

1， ORA-16038 ORA-12608错误
primary alert log 提示

TT05 (PID:74685): Attempting LAD:3 network reconnect (12608)
TT05 (PID:74685): LAD:3 network reconnect abandoned
TT05 (PID:74685): Error 12608 archiving LNO:1 to ''
...
...
 <Process> (PID:<OSPID>): Error 12608 archiving LNO:<Log number> 1 to '<Standby database unique name>'
 ORA-16038: log <Log number> sequence# <Seq No> cannot be archive

Standby alert log 提示

 rfs (PID:<OSPID>): Possible network disconnect with primary database

提示归档无法成功，从v$archivedlog 或alert log 统计online redo 生成量，归档频率。或收集同时段的awr report

2, 网络吞吐使用率高
检查主库和备库的网卡流量，如使用sar -n DEV 3, 检查是否使用率高，注意要检查整个网络链路，如线、卡、交换机、防火墙。操作系统诊断数据也非常有帮助，例如 OSWatcher（或 ExaWatcher，用于 Exadata）

3， AWR或ASH wait event

显示TT进程等待 ‘ Data Guard network buffer stall reap‘ event. NSSn 通常会等待“LNS wait on SENDREQ”，RFS 可能处于空闲状态。 Oracle 有相关的网络调整优化async-2587521.pdf ，如调整 TCP socket buffer

For example, if bandwidth is 622 Mbits and latency is 30 ms, then you would calculate the minimum size for the
RECV_BUF_SIZE and SEND_BUF_SIZE parameters as follows:
Bandwidth Delay Product (BDP) = bandwidth x latency
BDP = 622,000,000 (bandwidth) / 8 x 0.030 (latency) = 2,332,500 bytes.
Given this example the optimal send and receive socket buffer sizes are calculated as follows:
Socket buffer size = 3 x BDP
= 2,332,500 (BDP) x 3
= 6,997,500 bytes

With Oracle Net you can set the send and receive socket buffer sizes globally for all connections using the following
parameters in the sqlnet.ora:
RECV_BUF_SIZE=6997500
SEND_BUF_SIZE=6997500

解决方法

增加redo 传输网络带宽
配置redo 传输专用网络
分散业务负载，避免redo生成过于集中
优化REDO 大小

Peak redo rate according	Recommended redo log group size
<= 5 MB/sec	4 GB
<= 25 MB/sec	16 GB
<= 50 MB/sec	32GB
> 50 MB/sec	64 GB

DataGuard通常使用TCP协议
在同步 DataGuard 复制模式下，由于网络重做传输速度缓慢，主数据库上的性能延迟是可能的。

oratcptest工具
为了获得全貌，需要收集双方的tcpdump，另外oracle有一个专门的工具oratcptest ，是一个java (jar file）程序，Oracle MOS Note 2064368.1可以下载。可以帮助测试Datagurad redo 传输\OGG\或迁移等场景的网络资源使用。

oratcptest是client-server模式。

standby （server）

java -jar oratcptest.jar -server <server_name_or_ip> -port=<port_number>

primary ( client)

## ASYNC
java -jar oratcptest.jar  -mode=async -duration=100s -interval=20s -length=8k -port=1521

## SYNC 
java -jar oratcptest.jar  -mode=sync -duration=100s -interval=20s -length=8k -port=1521 -write

## FastSync
java -jar oratcptest.jar  -mode=sync -duration=100s -interval=20s -length=8k -port=1521

sync的测试结果输出样例

[Requesting a test]
        Message payload        = 8 kbytes
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = SYNC
        Disk write             = YES
        Statistics interval    = 20 seconds
        Test duration          = 100 seconds
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)
 
(12:10:30) The server is ready.
                    Throughput             Latency
(12:10:50)      7.544 Mbytes/s            1.037 ms   (disk-write 0.564 ms)
(12:11:10)      7.600 Mbytes/s            1.029 ms   (disk-write 0.568 ms)
(12:11:30)      7.641 Mbytes/s            1.024 ms   (disk-write 0.563 ms)
(12:11:50)      7.702 Mbytes/s            1.016 ms   (disk-write 0.557 ms)
(12:12:10)      7.600 Mbytes/s            1.029 ms   (disk-write 0.569 ms)
(12:12:10) Test finished.
               Socket send buffer = 166400 bytes
                  Avg. throughput = 7.617 Mbytes/s
                     Avg. latency = 1.027 ms (disk-write 0.564 ms)

总结
可能因为网络瓶颈导致dataguard同步延迟，在主备数据库日志中提示ora错误，在db层面也能看到相关的wait event，建议从OS层面收集网络信息。oracle增加oratcptest工具用于测试网络延迟。

— enjoy —

打赏

ORA-12608, ORA-16038

对不起，这篇文章暂时关闭评论。

上一篇：脚本：用于分析PostgreSQL lock trees(堵塞树) 的查询

下一篇： Troubleshooting ORA-29783:GPnP Attribute SET Failed With Error [CLSGPNP_NOT_FOUND]

ANBOB™

Oracle dataguard redo传输网络带宽不足哪些表现？

对不起，这篇文章暂时关闭评论。

最新文章

标签云集

文章索引

MySql Link

ORACLE Link

Others Link

国内好友

管理功能

微信公众号/Wechat