The FG(server process) and remote node LMSn process communication over the interconnect?(用户进程会和另一节点的LMS进程直接通信么?)
前几天有套RAC 需要调整UDP buffer OS参数,在HPUX平台对应的参数是socket_udp_rcvbuf_default ,该参数是进程级参数,也就是每个进程使用的UDP buffer的上限,当讨论到该参数对系统的影响时也就是会占用主机的一些内存,HP工程师的意见是该参数在操作系统资源允许的情况下大无害,那会占用多大呢?也就是活动进程使用的一些buffer,在进程的处理过程中会自动释放,超过了当然就是网络层的overflow丢弃, 因为实例间传送的信息多是使用的UDP协议,是一种send-forget的协议,如果目标端未收到是有发送端应用层协调重发机制的实现,当讨论到节点间传送信息的进程数时,ORACLE现场的兄弟意见是也就是两端的几个后台进程而以如N个lms进程和1个lmd进程在实例间使用UDP通信,总共没有几个,我当时提出我的疑问,应该存在local SERVER process和remote LMS process通信, 原厂的工程师给我看了他们的白皮书来证明应该只有LMS和LMS通信,究竟是什么情况, LMS会不会直接和远程的SERVER进程通信?
这个问题让处女座的我很是纠结了几天,当时我查了一些资料,后来向老白请教此事时,意见是各版本可能存在一定差异,按ORACLE做事的风格应该会那么做,也就是LMS会直接跨节点和SERVER进程通信。 其实我们可以想象如果大量的进程如果在节点间通信都要通过LMS和远程的LMS通信,传送GCS信息,然后再按原路返回,确实性能不是最佳。从网上找到了大师Riyaj Shamsudeen的一段描述如下,如在RAC环境中一个FG进程需要读取一个数据块时顺序是:
1,如果该block没有在本地的buffer cache时, 进程会确认BLOCK的master node.
2, 然后该进程FG会通过interconnect发送一个request给master node上的LMS进程
3,虽然发送了请求,但是也不知道该block cache在哪个实例
4,直到LMS应答该进程都等待一个place-holder事件如gc cr read or gc current read
5, 当到过LMS应答后时间记录到相应的事件中
6,如果block在远程实例的buffer是一种共享模式,lms进程grant block
7,远程的LMS进程会传速block给LOCAL 前端FG进程
8,FG进程复制该block buffer到本地的buffer cache
9, 该实例可能会获得该block的锁, 此时FG进程会看到gc event
上面的#7 说到远程的LMS会和FG进程通信,我用一个很简单的测试来验证一下本地的FG进程是不是会收到远程lms进程的UDP包就可以,UDP数据报header中记录了目标和源的port,下用tcpdump在主机上截取一个网络信息即可。
测试环境是ORACLE 11.2.0.3 2node RAC ON AIX6.1
# 网络信息
anbob1:/home/oracle# netstat -in Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll en8 1500 link#2 40.f2.e9.91.5f.b8 2726498605 0 2191505707 2 0 en8 1500 133.96.60 133.96.60.61 2726498605 0 2191505707 2 0 en8 1500 133.96.60 133.96.60.161 2726498605 0 2191505707 2 0 en8 1500 133.96.60 133.96.60.219 2726498605 0 2191505707 2 0 en9 1500 link#3 40.f2.e9.91.60.a4 185025747 0 3958938719 2 0 en9 1500 192.168.60 192.168.60.61 185025747 0 3958938719 2 0 en9 1500 169.254 169.254.46.116 185025747 0 3958938719 2 0 lo0 16896 link#1 280398452 0 280148799 0 0 lo0 16896 127 127.0.0.1 280398452 0 280148799 0 0 lo0 16896 ::1%1 280398452 0 280148799 0 0 oracle@anbob2:/home/oracle> netstat -in Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Coll en8 1500 link#2 40.f2.e9.91.5e.c2 905754685 0 3986285146 2 0 en8 1500 133.96.60 133.96.60.62 905754685 0 3986285146 2 0 en8 1500 133.96.60 133.96.60.162 905754685 0 3986285146 2 0 en9 1500 link#3 40.f2.e9.91.5d.ec 3957329386 0 179570814 2 0 en9 1500 192.168.60 192.168.60.62 3957329386 0 179570814 2 0 en9 1500 169.254 169.254.87.30 3957329386 0 179570814 2 0 lo0 16896 link#1 3118217948 0 3112824578 0 0 lo0 16896 127 127.0.0.1 3118217948 0 3112824578 0 0 lo0 16896 ::1%1 3118217948 0 3112824578 0 0
# 节点1 打开一个DB session 1
oracle@anbob1:/home/oracle> ora SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 18 13:38:55 2015 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options USERNAME INST_NAME HOST_NAME SID SERIAL# VERSION STARTED SPID OPID CPID SADDR PADDR -------------------- -------------------- ------------------------- ----- -------- ---------- -------- --------------- ----- --------------- ---------------- ---------------- SYS svp1 anbob1 7227 12631 11.2.0.3.0 20151112 3736900 957 12256212 0700000C223D23A0 0700000C194DE588 -ne
# 节点1 再打开一个新的Terminal
# 查看进程启动的网络端口及反向确认
anbob1:/home/oracle# lsof -i 4 -a -p 3736900 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME oracle 3736900 oracle 5u IPv4 0xf1000e0039c99a00 0t0 UDP 169.254.46.116:61380 oracle 3736900 oracle 6u IPv4 0xf1000e0003253600 0t0 UDP loopback:61381 oracle 3736900 oracle 9u IPv4 0xf1000e003e29d200 0t0 UDP 169.254.46.116:61382 anbob1:/home/oracle# lsof -i :61382 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME oracle 3736900 oracle 9u IPv4 0xf1000e003e29d200 0t0 UDP 169.254.46.116:61382 anbob1:/home/oracle# lsof -i :61380 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME oracle 3736900 oracle 5u IPv4 0xf1000e0039c99a00 0t0 UDP 169.254.46.116:61380
#在节点1 使用tcpdump捕捉在网络信息的源信息是本地FG server process进程监听的两个UDP端口上
anbob1:/home/oracle# tcpdump -i en9 -vnn 'dst host 169.254.46.116 and dst port 61382' or 'dst host 169.254.46.116 and dst port 61380' tcpdump: listening on en9, link-type 1, capture size 96 bytes
# 在节点2 创建一个新表,并cache进buffer
oracle@anbob2:/home/oracle> ora SQL*Plus: Release 11.2.0.3.0 Production on Wed Nov 18 13:50:56 2015 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options SQL> create table tt as select rownum id from dual connect by rownum<10; Table created. SQL> select * from tt; ID ---------- 1 2 3 4 5 6 7 8 9 9 rows selected.
# 回到节点1 刚才打开的db session1 查询刚才节点2 创建的表tt
oracle@anbob1:/home/oracle> SQL> select * from tt; ID ---------- 1 2 3 4 5 6 7 8 9 9 rows selected.
# 此时会看到刚才节点1 打开的第二个terminal 使用的tcpdump有捕捉到信息输出
anbob1:/home/oracle# tcpdump -i en9 -vnn 'dst host 169.254.46.116 and dst port 61382' or 'dst host 169.254.46.116 and dst port 61380' tcpdump: listening on en9, link-type 1, capture size 96 bytes 13:51:53.369509 IP (tos 0x0, ttl 30, id 58603, offset 0, flags [none], proto: UDP (17), length: 292) 169.254.87.30.58865 > 169.254.46.116.61380: UDP, length 264 13:51:53.369904 IP (tos 0x0, ttl 30, id 58605, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58865 > 169.254.46.116.61380: UDP, length 192 13:51:53.374009 IP (tos 0x0, ttl 30, id 58674, offset 0, flags [none], proto: UDP (17), length: 292) 169.254.87.30.58883 > 169.254.46.116.61380: UDP, length 264 13:51:53.374407 IP (tos 0x0, ttl 30, id 58676, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58883 > 169.254.46.116.61380: UDP, length 192 13:51:53.375614 IP (tos 0x0, ttl 30, id 58694, offset 0, flags [none], proto: UDP (17), length: 292) 169.254.87.30.58879 > 169.254.46.116.61380: UDP, length 264 13:51:53.375996 IP (tos 0x0, ttl 30, id 58700, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58879 > 169.254.46.116.61380: UDP, length 192 13:51:53.377105 IP (tos 0x0, ttl 30, id 58723, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58865 > 169.254.46.116.61380: UDP, length 192 13:51:53.377994 IP (tos 0x0, ttl 30, id 58741, offset 0, flags [none], proto: UDP (17), length: 292) 169.254.87.30.58871 > 169.254.46.116.61380: UDP, length 264 13:51:53.378299 IP (tos 0x0, ttl 30, id 58748, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58871 > 169.254.46.116.61380: UDP, length 192 13:51:53.379099 IP (tos 0x0, ttl 30, id 58759, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58879 > 169.254.46.116.61380: UDP, length 192 13:51:53.380600 IP (tos 0x0, ttl 30, id 58830, offset 0, flags [none], proto: UDP (17), length: 292) 169.254.87.30.58875 > 169.254.46.116.61380: UDP, length 264 13:51:53.380995 IP (tos 0x0, ttl 30, id 58833, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58875 > 169.254.46.116.61380: UDP, length 192 13:51:53.381795 IP (tos 0x0, ttl 30, id 58845, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58879 > 169.254.46.116.61380: UDP, length 192 13:51:53.382491 IP (tos 0x0, ttl 30, id 58858, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58875 > 169.254.46.116.61380: UDP, length 192 13:51:53.383291 IP (tos 0x0, ttl 30, id 58871, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58865 > 169.254.46.116.61380: UDP, length 192 13:51:53.383787 IP (tos 0x0, ttl 30, id 58877, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58865 > 169.254.46.116.61380: UDP, length 192 13:51:53.384587 IP (tos 0x0, ttl 30, id 58889, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58871 > 169.254.46.116.61380: UDP, length 192 13:51:53.385295 IP (tos 0x0, ttl 30, id 58903, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58879 > 169.254.46.116.61380: UDP, length 192 13:51:53.386188 IP (tos 0x0, ttl 30, id 58913, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58871 > 169.254.46.116.61380: UDP, length 192 13:51:53.388295 IP (tos 0x0, ttl 30, id 58950, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58875 > 169.254.46.116.61380: UDP, length 192 13:51:53.390576 IP (tos 0x0, ttl 30, id 58996, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58875 > 169.254.46.116.61380: UDP, length 192 13:51:53.391481 IP (tos 0x0, ttl 30, id 59007, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58875 > 169.254.46.116.61380: UDP, length 192 13:51:53.392478 IP (tos 0x0, ttl 30, id 59021, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58871 > 169.254.46.116.61380: UDP, length 192 13:51:53.393181 IP (tos 0x0, ttl 30, id 59032, offset 0, flags [none], proto: UDP (17), length: 220) 169.254.87.30.58871 > 169.254.46.116.61380: UDP, length 192
note:
从上面可以看到来自节点2 ip 169.254.87.30的5个UDP端口:58871 58875 58879 58865 58883
#去节点2上确认打开该udp 端口的程序
anbob2:/home/oracle# lsof -i :58871 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME oracle 2295732 oracle 24u IPv4 0xf1000e003492f200 0t0 UDP 169.254.87.30:58871 anbob2:/home/oracle# lsof -i :58875 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME oracle 2164270 oracle 24u IPv4 0xf1000e002bd10600 0t0 UDP 169.254.87.30:58875 anbob2:/home/oracle# lsof -i :58879 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME oracle 2228760 oracle 24u IPv4 0xf1000e003efd3200 0t0 UDP 169.254.87.30:58879 anbob2:/home/oracle# lsof -i :58865 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME oracle 2556378 oracle 24u IPv4 0xf1000e001f6fb200 0t0 UDP 169.254.87.30:58865 anbob2:/home/oracle# lsof -i :58883 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME oracle 10880168 oracle 24u IPv4 0xf1000e003b9df200 0t0 UDP 169.254.87.30:58883
NOTE:
从上面的输出,问共是这5个进程PID :2295732|2164270|2228760|2556378|10880168,下面查看进程名
anbob2:/home/oracle# ps -ef|egrep '2295732|2164270|2228760|2556378|10880168'|grep -v egrep oracle 2556378 1 0 Nov 12 - 11:15 ora_lmd0_svp2 oracle 2228760 1 2 Nov 12 - 200:23 ora_lms2_svp2 oracle 10880168 1 2 Nov 12 - 192:18 ora_lms3_svp2 oracle 2164270 1 3 Nov 12 - 197:07 ora_lms1_svp2 oracle 2295732 1 2 Nov 12 - 203:47 ora_lms0_svp2 oracle@anbob2:/home/oracle> ps -ef|grep lms|grep -v grep oracle 2228760 1 3 Nov 12 - 230:45 ora_lms2_svp2 oracle 10880168 1 2 Nov 12 - 222:29 ora_lms3_svp2 oracle 2164270 1 2 Nov 12 - 228:07 ora_lms1_svp2 grid 2230220 1 0 Oct 28 - 116:01 asm_lms0_+ASM2 oracle 2295732 1 2 Nov 12 - 235:38 ora_lms0_svp2
到这里确认了节点1的FG server process确认有收到 节点2 LMS和LMD 发送的GCS和GES 信息, 不过目前还有个小疑问是几行的block的数据为何会有所有的lms进程发送信息?后期再分析原因。 如果你对实验存在争议或新看法,希望通知我 weejar@gmail.com.
对不起,这篇文章暂时关闭评论。