Alert: Linux平台使用udev绑定ASM存储时,频繁的systemd-udevd导致CPU使用率高
最近查询时发现一套Linux(Suse Linux 12)平台上的Oracle主机CPU使用率偏高,该数据库并不繁忙,从top中发现大量的systemd-udevd 进程,是CPU的主要花费进程, 该现象并不局限于Suse,RHEL和OEL同样可能存在这些现象, 通常是当udev加载时,即使系统当前并无任何磁盘存储的调整,也会存在该现象。
Tasks: 5488 total, 95 running, 5393 sleeping, 0 stopped, 0 zombie %Cpu(s): 5.4 us, 52.1 sy, 0.0 ni, 31.5 id, 10.2 wa, 0.0 hi, 0.8 si, 0.0 st KiB Mem: 52805299+total, 41842224+used, 10963075+free, 4161240 buffers KiB Swap: 33554428 total, 0 used, 33554428 free. 11332657+cached Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 44213 grid 20 0 1899584 533268 108256 R 100.00 0.101 204368:13 mdb_mmon_-mgmtd 13175 root 20 0 247484 6136 4692 S 99.685 0.001 134536:12 .dlmmgr_exe 81845 oracle 20 0 59428 50420 2312 S 47.634 0.010 192716:12 OSWatcher.sh 18219 root rt 0 1397816 219816 113228 S 29.338 0.042 100928:20 osysmond.bin 46307 oracle 20 0 0.176t 89100 80388 R 24.921 0.017 0:13.67 oracle_46307_or 78723 oracle 20 0 0.176t 93772 87608 S 24.290 0.018 7:44.77 oracle_78723_or 4256 root 20 0 47628 8148 2932 D 20.505 0.002 16765:22 systemd-udevd 56824 root 20 0 63360 4144 2196 R 18.612 0.001 37966:24 zabbix-agentd 76935 root 20 0 46808 6280 1868 S 17.981 0.001 0:00.57 systemd-udevd 78478 root 20 0 46808 6280 1868 S 17.981 0.001 0:00.57 systemd-udevd 76748 root 20 0 46808 6280 1868 S 14.196 0.001 0:00.46 systemd-udevd 75507 root 20 0 46808 6280 1868 S 13.565 0.001 0:00.45 systemd-udevd 75484 root 20 0 46808 6280 1868 S 12.618 0.001 0:00.43 systemd-udevd 44059 grid -2 0 1465092 55256 51916 S 11.672 0.010 16439:47 mdb_vktm_-mgmtd 25227 grid -2 0 4602368 57444 54316 R 11.356 0.011 16421:34 asm_vktm_+asm1 81656 sensu 20 0 118068 11984 4628 S 9.464 0.002 0:00.30 ruby 26708 oracle -2 0 0.176t 61952 58212 R 9.148 0.012 15507:34 ora_vktm_order1 77028 root 20 0 46808 6280 1868 S 7.886 0.001 0:00.25 systemd-udevd 81339 oracle 20 0 14908 4780 2940 S 7.571 0.001 0:00.24 bash 2407 root 0 -20 0 0 0 S 6.940 0.000 289:40.17 kworker/56:1H 76859 root 20 0 46808 6280 1868 S 6.940 0.001 0:00.22 systemd-udevd 75575 root 20 0 46808 6280 1868 S 6.625 0.001 0:00.23 systemd-udevd 79110 root 20 0 46808 6280 1868 S 6.625 0.001 0:00.21 systemd-udevd 145 root rt 0 0 0 0 S 6.309 0.000 389:40.59 migration/27 9015 oracle 20 0 111196 4404 3252 S 5.994 0.001 0:04.25 sshd 75591 root 20 0 46808 6280 1868 S 5.994 0.001 0:00.20 systemd-udevd 75482 root 20 0 46808 6280 1868 S 5.678 0.001 0:00.21 systemd-udevd 75613 root 20 0 46808 6280 1868 S 5.678 0.001 0:00.20 systemd-udevd 50575 oracle 20 0 20212 7512 2216 R 5.363 0.001 0:03.05 top
Note:
sys cpu占50%以上,大量的systemd-udevd进程出现在TOP
oracle@anbob_com:/home/oracle> ps -ef|head -n 1;ps -ef|grep udevd UID PID PPID C STIME TTY TIME CMD root 569 4256 7 15:46 ? 00:00:01 /usr/lib/systemd/systemd-udevd root 697 4256 13 15:46 ? 00:00:02 /usr/lib/systemd/systemd-udevd root 730 4256 12 15:46 ? 00:00:02 /usr/lib/systemd/systemd-udevd root 885 4256 3 15:46 ? 00:00:00 /usr/lib/systemd/systemd-udevd root 1037 4256 3 15:46 ? 00:00:00 /usr/lib/systemd/systemd-udevd root 1077 4256 9 15:46 ? 00:00:01 /usr/lib/systemd/systemd-udevd root 1108 4256 15 15:46 ? 00:00:03 /usr/lib/systemd/systemd-udevd root 1226 4256 9 15:46 ? 00:00:01 /usr/lib/systemd/systemd-udevd root 1393 4256 4 15:46 ? 00:00:01 /usr/lib/systemd/systemd-udevd root 1732 4256 6 15:46 ? 00:00:01 /usr/lib/systemd/systemd-udevd root 1764 4256 16 15:46 ? 00:00:03 /usr/lib/systemd/systemd-udevd root 1932 4256 13 15:46 ? 00:00:02 /usr/lib/systemd/systemd-udevd root 2142 4256 8 15:46 ? 00:00:01 /usr/lib/systemd/systemd-udevd root 2358 4256 3 15:46 ? 00:00:00 /usr/lib/systemd/systemd-udevd root 2541 4256 10 15:46 ? 00:00:02 /usr/lib/systemd/systemd-udevd root 2719 4256 3 15:46 ? 00:00:00 /usr/lib/systemd/systemd-udevd root 2768 4256 12 15:46 ? 00:00:02 /usr/lib/systemd/systemd-udevd root 3208 4256 13 15:46 ? 00:00:02 /usr/lib/systemd/systemd-udevd ... oracle@anbob_com:/home/oracle> ps -ef|grep 4256 root 4256 1 1 2018 ? 11-15:25:34 /usr/lib/systemd/systemd-udevd oracle@anbob_com:/home/oracle> ps -ef|grep udevd|wc -l 78 oracle@anbob_com:/home/oracle> ps -ef|grep udevd|wc -l 94 oracle@anbob_com:/home/oracle> ps -ef|grep udevd|wc -l 111 oracle@anbob_com:/home/oracle> ps -ef|grep udevd|wc -l 111 oracle@anbob_com:/home/oracle> ps -ef|grep udevd|wc -l 111 oracle@anbob_com:/home/oracle> ps -ef|grep udevd|wc -l 111 oracle@anbob_com:/home/oracle> ps -ef|grep udevd|wc -l 111 oracle@anbob_com:/home/oracle> ps -ef|grep udevd|wc -l 111 oracle@anbob_com:/home/oracle> ps -ef|grep udevd|wc -l 2 oracle@anbob_com:/home/oracle> ps -ef|grep udevd|wc -l 2
Note:
可以发现是间断性出现。在1分钟左右会出现大量进程。
监控一下UDEV
oracle@anbob_com:/var/log> udevadm monitor monitor will print the received events for: UDEV - the event which udev sends out after rule processing KERNEL - the kernel uevent UDEV [78038439.181262] change /devices/virtual/block/sddlmdj (block) UDEV [78038439.185802] change /devices/virtual/block/sddlmgg (block) UDEV [78038439.305983] change /devices/virtual/block/sddlmdd (block) UDEV [78038439.306570] change /devices/virtual/block/sddlmef (block) UDEV [78038439.481351] change /devices/virtual/block/sddlmhk (block) UDEV [78038439.539047] change /devices/virtual/block/sddlmdf (block) UDEV [78038439.565448] change /devices/virtual/block/sddlmeh (block) UDEV [78038439.567730] change /devices/virtual/block/sddlmdn (block) UDEV [78038439.599172] change /devices/virtual/block/sddlmdl (block) UDEV [78038439.667162] change /devices/virtual/block/sddlmdi (block) UDEV [78038439.678857] change /devices/virtual/block/sddlmde (block) UDEV [78038439.686110] change /devices/virtual/block/sddlmeo (block) UDEV [78038439.690323] change /devices/virtual/block/sddlmek (block) UDEV [78038439.690703] change /devices/virtual/block/sddlmfa (block) UDEV [78038439.703990] change /devices/virtual/block/sddlmeb (block) UDEV [78038439.704975] change /devices/virtual/block/sddlmem (block) UDEV [78038439.722081] change /devices/virtual/block/sddlmfb (block) UDEV [78038439.723972] change /devices/virtual/block/sddlmee (block) UDEV [78038439.733363] change /devices/virtual/block/sddlmhd (block) UDEV [78038439.734902] change /devices/virtual/block/sddlmdg (block) UDEV [78038439.752396] change /devices/virtual/block/sddlmen (block) UDEV [78038439.757184] change /devices/virtual/block/sddlmei (block) UDEV [78038439.758108] change /devices/virtual/block/sddlmep (block) UDEV [78038439.758533] change /devices/virtual/block/sddlmej (block) UDEV [78038439.759363] change /devices/virtual/block/sddlmel (block) KERNEL[78038448.429456] change /devices/virtual/block/sddlmdd (block) KERNEL[78038448.438217] change /devices/virtual/block/sddlmdl (block) KERNEL[78038448.441948] change /devices/virtual/block/sddlmbl (block) KERNEL[78038448.446149] change /devices/virtual/block/sddlmbf (block) KERNEL[78038448.450574] change /devices/virtual/block/sddlmac (block) KERNEL[78038448.455110] change /devices/virtual/block/sddlmbc (block) KERNEL[78038448.465154] change /devices/virtual/block/sddlmfa (block) KERNEL[78038448.478430] change /devices/virtual/block/sddlmee (block) KERNEL[78038448.483914] change /devices/virtual/block/sddlmab (block) KERNEL[78038448.488904] change /devices/virtual/block/sddlmch (block) KERNEL[78038448.493303] change /devices/virtual/block/sddlmbe (block)
Note:
存在大量的UDEV和KERNEL change event.
grid@anbob_com:/dev>ls -l asm* lrwxrwxrwx 1 root root 7 May 11 10:54 asm-disk1 -> sddlmaa lrwxrwxrwx 1 root root 7 May 11 10:54 asm-disk10 -> sddlmaj lrwxrwxrwx 1 root root 7 May 11 10:54 asm-disk100 -> sddlmhh lrwxrwxrwx 1 root root 7 May 11 10:54 asm-disk101 -> sddlmhi lrwxrwxrwx 1 root root 7 May 11 10:54 asm-disk102 -> sddlmhj lrwxrwxrwx 1 root root 7 May 11 10:54 asm-disk103 -> sddlmhk lrwxrwxrwx 1 root root 7 May 11 10:54 asm-disk104 -> sddlmhl lrwxrwxrwx 1 root root 7 May 11 10:54 asm-disk105 -> sddlmhm lrwxrwxrwx 1 root root 7 May 11 10:54 asm-disk106 -> sddlmhn lrwxrwxrwx 1 root root 7 May 11 10:54 asm-disk107 -> sddlmho lrwxrwxrwx 1 root root 7 May 11 10:54 asm-disk108 -> sddlmhp lrwxrwxrwx 1 root root 7 May 11 10:54 asm-disk109 -> sddlmia .... oracle@anbob_com:/dev> ls -l asm*|awk '{print "ls -l " $11}' ls -l sddlmaa ls -l sddlmaj ls -l sddlmhh ls -l sddlmhi ls -l sddlmhj ls -l sddlmhk ls -l sddlmhl ls -l sddlmhm ls -l sddlmhn ls -l sddlmhgoracle@anbob_com:/dev> ls -l sddlmgp brw-rw---- 1 grid asmadmin 244, 240 May 11 15:39 sddlmgp oracle@anbob_com:/dev> ls -l sddlmha brw-rw---- 1 grid asmadmin 243, 0 May 11 15:39 sddlmha oracle@anbob_com:/dev> ls -l sddlmhb brw-rw---- 1 grid asmadmin 243, 16 May 11 15:39 sddlmhb oracle@anbob_com:/dev> ls -l sddlmhc brw-rw---- 1 grid asmadmin 243, 32 May 11 15:39 sddlmhc oracle@anbob_com:/dev> ls -l sddlmhd brw-rw---- 1 grid asmadmin 243, 48 May 11 15:39 sddlmhd
Note:
可以发现操作系统上聚合以后的设备和udev 创建的ASM-DISK 链接设备的时间也一直在改变。
检查OS 日志中并无报错信息。当前也并未影响ASM使用。
udev rule的规则文件内容是有以下脚本生成.
# # append to /etc/udev/rules.d/99-oracle-asmdevices.rules for i in b c d e f ; do echo "KERNEL==\"sddl*\", BUS==\"scsi\", PROGRAM==\"/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/\$name\", RESULT==\"`/sbin/scsi_id --whitelisted --replace-whitespace --device=/dev/sddl$i`\", NAME=\"asm-disk$i\", OWNER=\"grid\", GROUP=\"asmadmin\", MODE=\"0660\"" done
当Oracle进程打开该设备进行write然后close时,这会合成一个change事件。 任何具有ACTION ==“ add | change”的udev规则都将被重新加载。udev 对于inotify有两个option: watch(default)和nowatch.
watch Watch the device node with inotify, when closed after being opened for writing, a change uevent will be synthesised. nowatch Disable the watching of a device node with inotify.
对于oracle读写文件是正常形为,可以抑制一下正常的change事件,禁用这些ORACLE ASM设备inotify watch. 使用以方法:
o suppress the false positive change events disable the inotify watch for devices used for Oracle ASM using following steps:
1, 在文件/etc/udev/rules.d/99-oracle-asmdevices.rules 末尾追加
ACTION=="add|change", KERNEL=="sd*", OPTIONS:="nowatch"
2,重启udev规则文件,使以上修改在内存生效。–可在线做
$ /sbin/udevadm control --reload-rules $ /sbin/udevadm trigger --type=devices --action=change
此问题已解决,但是oracle建议对于未出现udev进程性能问题的环境,可以保护原来的默认watch OPTIONS.
对不起,这篇文章暂时关闭评论。