AIX 增加白名单(ipfilter)后导致的CRS启动失败(CRS-1612:Network communication xxx timeout )
安全问题近几年一直是关注的焦点,不断涌现出一些网站敏感信息泄漏的新闻, 在《网络安全法》颁布以后更加有了法律依据,前段时间在一次网络安全宣传中看到关于《刑法》第286条中对于直接责任人的解释,让我及替所有DBA及运维人员安全担忧。 对于一些有”关键信息基础设施”的单位,集团和二部委也开始了关于安全的审查。所以今年很多时间都是在做安全相关的工作,于是就出现了下面的这起故障。
某天晚上突然几乎同一时间5套数据库出现可用性告警,两节点的RAC全是2节点crash. 扫了一遍日志是脑裂,环境全是11.2.03 2-Nodes RAC ON AIX, 这也是当前版本的算法决定的,在12C以前当只有网络心跳异常时是保留节点号最小的节点,这点在12C版本发生了改变新的算法,引入了节点权重(node weight),当脑裂发生是是权重高的活下来。
咨询了当时无网络策略或硬件变更,CRS无法启动,附几个重启时的日志信息。
# Node2 GI alert log
2018-09-18 17:30:34.985 [gpnpd(5571146)]CRS-2328:GPNPD started on node anbob2. 2018-09-18 17:30:38.430 [cssd(4326132)]CRS-1713:CSSD daemon is started in clustered mode 2018-09-18 17:30:39.944 [ohasd(4784458)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE 2018-09-18 17:30:58.605 [cssd(4326132)]CRS-1707:Lease acquisition for node anbob2 number 2 completed 2018-09-18 17:31:00.017 [cssd(4326132)]CRS-1605:CSSD voting file is online: /dev/rlv_vote2; details in /oracle/app/11.2.0.3/grid/log/anbob2/cssd/ocssd.log. 2018-09-18 17:31:00.020 [cssd(4326132)]CRS-1605:CSSD voting file is online: /dev/rlv_vote3; details in /oracle/app/11.2.0.3/grid/log/anbob2/cssd/ocssd.log. 2018-09-18 17:31:00.032 [cssd(4326132)]CRS-1605:CSSD voting file is online: /dev/rlv_vote1; details in /oracle/app/11.2.0.3/grid/log/anbob2/cssd/ocssd.log. 2018-09-18 17:31:20.572 [cssd(4326132)]CRS-1612:Network communication with node anbob1 (1) missing for 50% of timeout interval. Removal of this node from cluster in 14.376 seconds 2018-09-18 17:31:27.573 [cssd(4326132)]CRS-1611:Network communication with node anbob1 (1) missing for 75% of timeout interval. Removal of this node from cluster in 7.375 seconds 2018-09-18 17:31:32.573 [cssd(4326132)]CRS-1610:Network communication with node anbob1 (1) missing for 90% of timeout interval. Removal of this node from cluster in 2.375 seconds 2018-09-18 17:31:34.955 [cssd(4326132)]CRS-1609:This node is unable to communicate with other nodes in the cluster and is going down to preserve cluster integrity; details at (:CSSNM00008:) in /oracle/app/11.2.0.3/grid/log/anbob2/cssd/ocssd.log. 2018-09-18 17:31:34.955 [cssd(4326132)]CRS-1656:The CSS daemon is terminating due to a fatal error; Details at (:CSSSC00012:) in /oracle/app/11.2.0.3/grid/log/anbob2/cssd/ocssd.log 2018-09-18 17:31:35.020 [cssd(4326132)]CRS-1603:CSSD on node anbob2 shutdown by user.
# Node2 crsd log
# node2 crsd log 2018-09-10 15:32:21.234: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203 2018-09-10 15:32:21.434: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203 2018-09-10 15:32:21.635: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203 2018-09-10 15:32:21.672: [GIPCHALO][2314] gipchaLowerProcessNode: no valid interfaces found to node for 4294967295 ms, node 110ef48f0 { host 'anbob1', haName '420d-6a69-ed3b-01e1', srcLuid 0d64970f-8036598f, dstLuid 00000000-00000000 numInf 1, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [998 : 998], createTime 1975300868, sentRegister 1, localMonitor 0, flags 0x4 } 2018-09-10 15:32:21.835: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203 2018-09-10 15:32:22.035: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203 2018-09-10 15:32:22.235: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203 2018-09-10 15:32:22.436: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203 2018-09-10 15:32:22.636: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203 2018-09-10 15:32:22.836: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203 2018-09-10 15:32:23.036: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203 2018-09-10 15:32:23.236: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203 2018-09-10 15:32:23.436: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203 2018-09-10 15:32:23.637: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203 2018-09-10 15:32:23.837: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203 2018-09-10 15:32:24.037: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203 2018-09-10 15:32:24.237: [ OCRMAS][3342]proath_master:100b: Polling, connect to master not complete retval1 = 203, retval2 = 203
# node 1 crsd log
2018-09-10 15:25:44.586: [GIPCHALO][2314] gipchaLowerDropMsg: dropping because of sequence timeout, waited 30006, msg 116893738 { len 1160, seq 572, type gipchaHdrTypeRecvEstablish (5), lastSeq 0, lastAck 0, minAck 571, flags 0x1, srcLuid 0d64970f-8036598f, dstLuid 00000000-00000000, msgId 570 }, node 11177a490 { host 'anbob2', haName '0e69-b2a9-e176-7ec8', srcLuid 535d9395-30941506, dstLuid 0d64970f-8036598f numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [59 : 59], createTime 2526153249, sentRegister 1, localMonitor 0, flags 0x0 } 2018-09-10 15:25:45.586: [GIPCHALO][2314] gipchaLowerDropMsg: dropping because of sequence timeout, waited 30006, msg 1168a2898 { len 1160, seq 573, type gipchaHdrTypeRecvEstablish (5), lastSeq 0, lastAck 0, minAck 572, flags 0x1, srcLuid 0d64970f-8036598f, dstLuid 00000000-00000000, msgId 571 }, node 11177a490 { host 'anbob2', haName '0e69-b2a9-e176-7ec8', srcLuid 535d9395-30941506, dstLuid 0d64970f-8036598f numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [60 : 60], createTime 2526153249, sentRegister 1, localMonitor 0, flags 0x0 } 2018-09-10 15:25:45.586: [GIPCHALO][2314] gipchaLowerProcessNode: no valid interfaces found to node for 2526214261 ms, node 11177a490 { host 'anbob2', haName '0e69-b2a9-e176-7ec8', srcLuid 535d9395-30941506, dstLuid 0d64970f-8036598f numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [60 : 60], createTime 2526153249, sentRegister 1, localMonitor 0, flags 0x4 } ... 2018-09-10 15:25:49.587: [GIPCHALO][2314] gipchaLowerDropMsg: dropping because of sequence timeout, waited 30006, msg 1168af158 { len 1160, seq 577, type gipchaHdrTypeRecvEstablish (5), lastSeq 0, lastAck 0, minAck 576, flags 0x1, srcLuid 0d64970f-8036598f, dstLuid 00000000-00000000, msgId 575 }, node 11177a490 { host 'anbob2', haName '0e69-b2a9-e176-7ec8', srcLuid 535d9395-30941506, dstLuid 0d64970f-8036598f numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [64 : 64], createTime 2526153249, sentRegister 1, localMonitor 0, flags 0x0 } 2018-09-10 15:31:36.663: [GIPCHALO][2314] gipchaLowerProcessNode: no valid interfaces found to node for 2526565337 ms, node 111775ef0 { host 'anbob2', haName '0e69-b2a9-e176-7ec8', srcLuid 535d9395-9e9ed2e1, dstLuid 0d64970f-8036598f numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [50 : 50], createTime 2526514328, sentRegister 1, localMonitor 0, flags 0x4 } 2018-09-10 15:31:37.663: [GIPCHALO][2314] gipchaLowerDropMsg: dropping because of sequence timeout, waited 30006, msg 11688a4b8 { len 1160, seq 925, type gipchaHdrTypeRecvEstablish (5), lastSeq 0, lastAck 0, minAck 924, flags 0x1, srcLuid 0d64970f-8036598f, dstLuid 00000000-00000000, msgId 923 }, node 111775ef0 { host 'anbob2', haName '0e69-b2a9-e176-7ec8', srcLuid 535d9395-9e9ed2e1, dstLuid 0d64970f-8036598f numInf 0, contigSeq 0, lastAck 0, lastValidAck 0, sendSeq [51 : 51], createTime 2526514328, sentRegister 1, localMonitor 0, flags 0x0 } 2018-09-10 15:31:38.663: [GIPCHALO][2314] gipchaLowerDropMsg: dropping because of sequence timeout, waited 30007
# node 2 cssd log
2018-09-18 17:31:02.553: [ CSSD][1029]clssgmClientConnectMsg: msg flags 0x0000 2018-09-18 17:31:03.032: [ CSSD][2587]clssnmvDHBValidateNcopy: node 1, anbob1, has a disk HB, but no network HB, DHB has rcfg 432128556, wrtcnt, 37948455, LATS 2674619339, lastSeqNo 37948452, uniqueness 1536633719, timestamp 1537263062/3224929250 2018-09-18 17:31:03.052: [ CSSD][4900]clssgmWaitOnEventValue: after CmInfo State val 3, eval 1 waited 0 2018-09-18 17:31:03.067: [ CSSD][4129]clssnmvDHBValidateNcopy: node 1, anbob1, has a disk HB, but no network HB, DHB has rcfg 432128556, wrtcnt, 37948456, LATS 2674619374, lastSeqNo 37948453, uniqueness 1536633719, timestamp 1537263062/3224929768 2018-09-18 17:31:03.687: [ CSSD][5928]clssnmConnSetNames: hostname anbob1 privname 192.168.43.21 con 60d 2018-09-18 17:31:03.687: [ CSSD][5928]clssnmSetNodeProperties: properties node 1 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17
# node2 gipc log
[ OCRMSG][1]GIPC error [29] msg [gipcretConnectionRefused] [ OCRMSG][1]GIPC error [29] msg [gipcretConnectionRefused] [ OCRMSG][1]GIPC error [29] msg [gipcretConnectionRefused] ...
Note:
截至到当前表象都是心跳网通信出现了问题,但是手动互相ping private IP, HAIP都正常, traceroute 也正常,因为之前遇到过一个网络异常的案例<Crsd start fail and crsd.log show “Policy Engine is not initialized yet”& evmd.log show “[gipcretConnectionRefused] [29]”>, 这次有很多日志相似之处。
想到前几日有在主机层(OS layer)增加白名单, 于是建议主机工程师先关闭主机白名单,不久主机的管理员说已关闭,可以测试了,提醒DBA们以后处理问题一定要自己确认呀,不要相信任何人说的话。我们手动再次关闭了2节点手动重启,更糟糕的事情发生了,节点2启动,节点1被驱逐,但是节点2又启动失败。导致了无任何节点正常无法提供服务的局面,目前还没有想到较好的方法避免这个问题,后手动重启节点1无法启动,当时为了尽快恢复业务手动重启了OS, 节点1都启动正常。接下来分析就更加困难,尝试启2节点可能会影响1节点, 还好是半夜,有一套数据库可以暂停一段业务。MOS中关于AIX有一篇也提到”
IBM PowewSC disables some UDP/TCP related features and has network packet filtering feature, it blocks the private network layer communication, causing CRSD can not communicate with each other and the 2nd node CRSD can not join the cluster.”
主机的人也没给说法,这时来传来了消息,问题的导火索找到了, 同一时间点安全室在做全网的端口扫描。说是扫描的是开启NFS的主机,但是本次有套数据库无NFS也同样导致节点驱逐。经过多次测试后来一同事是在关闭了整个IPsec服务,节点2 CRS启动正常, 原来主机工程师只是恢复了最近新加的端口限制, 这点让我走了不少弯路,当然后来了尽快恢复业务 ,停掉了出问题数据库主机的IPsec服务,恢复了数据库服务。
关闭IPsec4的方法:
$ smitty ipsec4
>> Start/Stop IP Security
>>>>Stop IP Security
>>>>>>KEEP definition in database [yes]
Command: OK stdout: yes stderr: no
Before command completion, additional instructions may appear below.
ipsec_v4 Defined
anbob1:/>$ lsdev -l ipsec_v4
ipsec_v4 Defined IP Version 4 Security Extension
关于IPsec
IPsec 是用来创建服务器之间加密通信通道的协议,此通道也常被称为隧道或 VPN 隧道。本文不会详细讨论 IPsec,如果想要在您的环境中使用 IPSec,要保证已安装以下包:
bos.msg.LANG.net.ipsec
bos.net.ipsec.websm
bos.crypto-priv
lsfilt:列出表中的过滤规则。创建之后,每条规则都会被分配一个编号,可以轻松地使用该命令看到。
genfilt:向表中添加一条过滤规则。这是用来创建新过滤的命令。如果未使用 –n 参数来指定位置,那么新的规则将会被添加到表格末尾。
chfilt:用来改变现有的过滤规则。您需要提供规则 ID 以指明要修改哪条规则。规则 1 是默认规则,无法使用此命令修改。
rmfilt:rm 后缀对所有 UNIX 管理员来说应该很熟悉。您可以使用此命令在任何时候根据规则 ID 来删除过滤规则。
mkfilt:这是一个重要的命令,它可以激活或停用表中的过滤规则,启用或禁用过滤日志,并改变默认规则。如果要使对过滤表的更改生效,需要在运行此命令时带上一些参数。
当谈到 TCP/IP 过滤中的策略时,通常是指两种可能的安全方法:
默认拒绝所有流量,只允许您许可的。
默认允许所有流量,只拒绝您限制的。
这里的策略是默认允许所有流量,然后针对指定端口号限制所有通信,再允许该端口的特定IP段通信,如SSH 22端口。我们把相同的主机白名单规则应用到了测试主机,当天主机配置的端口涉及22 \ 2049\ 123.
经测试22端口不影响CRS启动。确认IPsec已启用:
anbob1:/> lsdev -l ipsec_v4 ipsec_v4 Available IP Version 4 Security Extension anbob1:/> lsfilt -v4 Beginning of IPv4 filter rules. Rule 1: Rule action : permit Source Address : 0.0.0.0 Source Mask : 0.0.0.0 Destination Address : 0.0.0.0 Destination Mask : 0.0.0.0 Source Routing : no Protocol : udp Source Port : eq 4001 Destination Port : eq 4001 Scope : both Direction : both Logging control : no Fragment control : all packets Tunnel ID number : 0 Interface : all Auto-Generated : yes Expiration Time : 0 Description : Default Rule genfilt -v 4|6 [ -n fid] [ -a D|P|I|L|E|H|S ] -s s_addr -m s_mask [-d d_addr] [ -M d_mask] [ -g Y|N ] [ -c protocol] [ -o s_opr] [ -p s_port] [ -O d_opr] [ -P d_port] [ -r R|L|B ] [ -w I|O|B ] [ -l Y|N ] [ -f Y|N|O|H ] [ -t tid] [ -i interface] [-D description] [-e expiration_time] [-x quoted_pattern] [-X pattern_filename ] [-C antivirus_filename] -C antivirus_filename 指定抗病毒名。-C 标志意味着ClamAV病毒库的一些版本。 -D description 描述介绍。 -v 4|6 指定IP版本 -n fid 所添加ID将会被添加至第 fid 条规则之前 -a Action D(eny) | P(ermit) | I(f) | (e)L(se) | E(ndif)。所有IF规则必须关联ENDIF规则结束。 -s s_addr 源地址 -m s_mask 源地址掩码 -d d_addr 目标地址 -M d_mask 目标地址掩码 -g Y|N 用于Permit规则,默认为Y,表示过滤规则可以使用源路由的IP包。 -c protocol 协议,默认all。有效值udp/icmp/icmpv6/tcp/tcp.ack/ospf/ipip/esp/ah/all -o s_opr | ICMP Code Opertion 源端口或者ICMP类型 操作。有效值:lt/le/gt/ge/eq/neq/any。默认any,当-c ospf时,必须为any。 -p s_port 源端口或ICMP类型。 -O d_opr | ICMP Code Opertion 目标端口或者ICMP类型 操作。有效值:lt/le/gt/ge/eq/neq/any。默认any,当-c ospf时,必须为any。 -P d_port 目标端口或ICMP类型 -r R|L|B 路由,默认B。指定规则是用于R(转发包)、L(发往或来自本机的包)、B(两者都使用) -w I|O|B 默认B。指定规则应用于I(输入包)、O(输出包)、B(两者都使用)。使用代-x -X或-C 模式是使用O选项无效,使用B有效,但只检查输入包。 -l Y|N 是否记录(匹配规则的包)日志,默认N。 -f Y|N|O|H 分段控制、默认为Y(所有包)。N(未分段包)、O(只用于分段和分段头)、H(只应用于分段头和未分段)。 -t tid 指定于该规则相关的通道标识,所有匹配包都要经过此通道。不指定此项,规则只作用于非流量通道。 -i interface 指定接口卡,默认为all。 -e expiration_time 过期时间(秒)。 -x pattern 匹配模式 -X patternfile 匹配模式文件。每行一个模式
Enable Logging
## Backup syslog.conf file before modifying it. cp /etc/syslog.conf /etc/syslog.conf.bak20180918 ## Append entry for IP filters logs. echo "local4.debug /var/adm/ipsec.log" >> /etc/syslog.conf ## Create log file and set permissions (permissions may depend on ## company policies) touch /var/adm/ipsec.log chmod 644 /var/adm/ipsec.log ## Refresh the syslog subsystem to activate the new configuration. refresh -s syslogd 0513-095 The request for subsystem refresh was completed successfully.
ipsec常用命令
oracle@anbob1:/home/oracle:11G> netstat -in Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll en12 1500 link#2 34.40.b5.a8.cd.ce 140545925 0 82952998 2 0 en12 1500 133.96.43 133.96.43.21 140545925 0 82952998 2 0 en12 1500 133.96.43 133.96.43.221 140545925 0 82952998 2 0 en12 1500 133.96.43 133.96.43.121 140545925 0 82952998 2 0 en13 1500 link#3 34.40.b5.a8.cd.66 10671342 0 1972523 2 0 en13 1500 192.168.43 192.168.43.21 10671342 0 1972523 2 0 en13 1500 169.254 169.254.47.5 10671342 0 1972523 2 0 lo0 16896 link#1 60300139 0 60299825 0 0 lo0 16896 127 127.0.0.1 60300139 0 60299825 0 0 lo0 16896 ::1%1 60300139 0 60299825 0 0 anbob1:/var/adm> ifconfig en13 en13: flags=1e080863,c0<UP,BROADCAST,NOTRAILERS,RUNNING,SIMPLEX,MULTICAST,GROUPRT,64BIT,CHECKSUM_OFFLOAD(ACTIVE),LARGESEND,CHAIN> inet 192.168.43.21 netmask 0xffffff00 broadcast 192.168.43.255 inet 169.254.47.5 netmask 0xffff0000 broadcast 169.254.255.255 tcp_sendspace 131072 tcp_recvspace 65536 rfc1323 0 如生成规则 anbob2:/> genfilt -v 4 -a P -s 192.168.43.0 -m 255.255.255.0 -d 0 -M 0 -g Y -c udp -O eq -P 123 -w I -l Y -i en12 Filter rule 17 for IPv4 has been added successfully. 使规则生效 anbob2:/> mkfilt -v 4 -g stop -u 查看规则 anbob2:/> lsfilt -v 4 -O|grep 123 13|deny|0.0.0.0|0.0.0.0|0.0.0.0|0.0.0.0|yes|udp|any|0|eq|123|both|inbound|yes|all packets|0|all|0||| 16|permit|133.96.60.0|255.255.255.0|0.0.0.0|0.0.0.0|yes|udp|any|0|eq|123|both|inbound|yes|all packets|0|all|0||| 17|permit|192.168.43.0|255.255.255.0|0.0.0.0|0.0.0.0|yes|udp|any|0|eq|123|both|inbound|yes|all packets|0|en12|0||| 改变#17规则到心跳网络 anbob2:/> chfilt -v 4 -n 17 -i en13 Filter rule 17 for IPv4 has been changed successfully. anbob2:/> mkfilt -v 4 -g start -u anbob2:/> lsfilt -v 4 -O|grep 123 13|deny|0.0.0.0|0.0.0.0|0.0.0.0|0.0.0.0|yes|udp|any|0|eq|123|both|inbound|yes|all packets|0|all|0||| 16|permit|133.96.60.0|255.255.255.0|0.0.0.0|0.0.0.0|yes|udp|any|0|eq|123|both|inbound|yes|all packets|0|all|0||| 17|permit|192.168.43.0|255.255.255.0|0.0.0.0|0.0.0.0|yes|udp|any|0|eq|123|both|inbound|yes|all packets|0|en13|0||| 移除规则 anbob2:/> rmfilt -v 4 -n 13 Filter rule 13 for IPv4 has been removed successfully. anbob2:/> mkfilt -v 4 -u 注意顺序,要permit在前, deny在后 anbob2:/> lsfilt -v4 -O|grep 123 14|permit|133.96.60.0|255.255.255.0|0.0.0.0|0.0.0.0|yes|udp|any|0|eq|123|both|inbound|yes|all packets|0|en12|0||| 15|permit|192.168.43.0|255.255.255.0|0.0.0.0|0.0.0.0|yes|udp|any|0|eq|123|both|inbound|yes|all packets|0|en13|0||| 17|deny|0.0.0.0|0.0.0.0|0.0.0.0|0.0.0.0|yes|udp|any|0|eq|123|both|inbound|yes|all packets|0|all|0|||
总结:
本次故障是因为前期主机配置了白名单,安全扫描导致CRS 2节点crash, 在CRS自动重启中又因为白名单限制导致RAC节点间心跳网络通信异常,无法启动CRS进程。此时手动启动2节点甚至会导致1节点crash. 目前没有找到官方文档描述对123端口的描述,众所周知Port 123用于NTP服务,当前的数据库主机使用的是NTP做时间同步,但是对于NTP server的IP段是允许的,不知是否在Oracle代码中写入了对于端口的检测。ORACLE原厂在SR中只是说ORACLE RAC不支持在private network增加任何网络防火墙限制, 同时我使用tcpdump 也没有发现节点间的123端口的通信。
tcpdump -i en13 -vnn ‘dst host 192.168.43.22 and dst port 123’
tcpdump -i en13 -vnn ‘dst port 123’
tcpdump -i en13 -v ‘port 123’
tcpdump -i en13 ‘port 123’ or ‘port 2049’
经测试只有在加主机白名单情况下的再安全扫描时才会导致ORACLE CRS驱逐,主机不存在白名单安全扫描不会影响RAC进程. 如果要使用IPsec或者是Linux IPTABLES 配置OS的白名单安全策略(前两天同样有其它客户有找我咨询是在linux中使用iptables同样出现了此类问题), 如果配置白名单下面两种方案我们测试是可行的:
1, 端口限制只增加在public network, 建议不要在心跳网Interconnect (private) network增加任何网络策略限制
2,端口限制在所有网卡, 但是对于private network要允许private ip通信
目前我们限制的端口没有增加169.254(HAIP)段,测试是没影响CRS启动,如果出现HAIP通信的端口限制,同样也要增加HAIP白名单在private network NIC .
对不起,这篇文章暂时关闭评论。