Troubleshooting Oracle db crash caused by Linux OOM kill 内存耗尽
最近半年遇到了至少有4例因为oracle内存耗尽出现的OOM kill oracle进程,DB instance crash的现象, 常见原因是内存分配不合理,如过大的Hugepage或没配置Hugepage, 或过大的SGA,或有备份导出任务占用过多的cached内存。 之前整理过《Troubleshooting Out-Of-Memory(OOM) killer db crash when memory exhausted》, 仅记录一下问题现象
常见分析思路:
查看DB alert log
查看OS log
确认OOM的进程
OSW看vmstat, meminfo, ps, top
查看TOP 进程
查看MEM使用对比
关注hugepage或pagetables
DASH查找PGA使用趋势
进程PGA内存区
pmap查看进程内存
lmhb trace(RAC)
分析core dump
案例1
Oracle Exadata 环境19c环境,内存耗尽,实例crash,查看内存使用进程为lms, 5个lms进程占用了约60GB内存。内存逐渐上升
[weejar.weejar] ➤ head -n 2 ps.txt; cat ps.txt|egrep -v 'zzz|PPID' |sort -nrk 11|head -n 30 zzz <06/21/2023 12:10:10> subcount: 36 F S RUSER PID PPID C PSR PRI NI ADDR RSS SZ WIDE-WCHAN-COLUMN STIME TT TIME CMD 0 S oracle 61315 1 13 80 41 - - 17621076 105778972 poll_schedule_tim Apr07 ? 10-01:39:20 ora_lms3_phnsf3 0 S oracle 61307 1 13 91 41 - - 13360756 104579991 poll_schedule_tim Apr07 ? 9-22:57:59 ora_lms1_phnsf3 0 S oracle 61319 1 14 15 41 - - 11640576 104920982 - Apr07 ? 11-02:16:11 ora_lms4_phnsf3 0 S oracle 61303 1 14 10 41 - - 10840900 104616644 - Apr07 ? 10-21:27:22 ora_lms0_phnsf3 0 S oracle 61311 1 14 21 41 - - 10789784 104596654 poll_schedule_tim Apr07 ? 11-03:38:17 ora_lms2_phnsf3 0 S root 18754 1 0 79 19 0 - 1375980 2841981 futex_wait_queue_ Apr07 ? 12:44:55 /u01/orgrid/oracle/product/194/jdk/jre/bin/java -server -Xms512m -Xmx1024m -Djava.awt.headless=true -Ddisable.checkForUpdate=true -XX:ParallelGCThreads=5 oracle.rat.tfa.TFAMain /u01/orgrid/
# pmap -x [PID]
AnonPages匿名页,与meminfo在重启前后匹配,耗用内存较大区。
Note : 重启后初始化内存仅60M,重启前达17GB
案例2
oracle 11g环境平时数据库主机有本地RMAN备份未的指定filemax size备份集分片,产生较大cached内存, inactive file达100GB.
此类可以拆分备份文件大小,减少文件持有时间,调整OS内核参数,或备份完后手动释放cached。
# sync; echo 3 > /proc/sys/vm/drop_caches
案例3
oracle 11g使用VCS HA双机, 内存耗尽,最近重启后hugepage被人改动,导致pagetables浪费大量内存。主机上有logminer实时解析archivelog导致OS cache使用也较高。linux 6.9 OS日志也出现了OS hang的报错。
内存对比
[root@anbob2 oswmeminfo]# more anbob2_meminfo_23.07.11.0900.dat
zzz ***Tue Jul 11 09:00:08 CST 2023 zzz ***Tue Jul 11 13:08:51 CST 2023
MemTotal: 528940012 kB MemTotal: 528940012 kB
MemFree: 57401212 kB MemFree: 667468 kB <<<< -50g
Buffers: 79244 kB Buffers: 5588 kB
Cached: 349472200 kB Cached: 221741568 kB <<<<< -100g
SwapCached: 14168 kB SwapCached: 192832 kB
Active: 277106340 kB Active: 201495284 kB
Inactive: 92552196 kB Inactive: 42237424 kB <<<< -50g
Active(anon): 197398472 kB Active(anon): 201425664 kB
Inactive(anon): 45652700 kB Inactive(anon): 42184392 kB
Active(file): 79707868 kB Active(file): 69620 kB <<<< -70g
Inactive(file): 46899496 kB Inactive(file): 53032 kB <<<< -40g
Unevictable: 2380312 kB Unevictable: 2379472 kB
Mlocked: 558700 kB Mlocked: 568080 kB
SwapTotal: 32767996 kB SwapTotal: 32767996 kB
SwapFree: 31869992 kB SwapFree: 30463804 kB
Dirty: 16517776 kB Dirty: 48 kB <<<< -16g
Writeback: 0 kB Writeback: 0 kB
AnonPages: 22516868 kB AnonPages: 24230932 kB
Mapped: 156051936 kB Mapped: 161001544 kB
Shmem: 222796464 kB Shmem: 221477168 kB
Slab: 3461896 kB Slab: 1969372 kB
SReclaimable: 1878396 kB SReclaimable: 876144 kB
SUnreclaim: 1583500 kB SUnreclaim: 1093228 kB
KernelStack: 163600 kB KernelStack: 163296 kB
PageTables: 69364292 kB PageTables: 252790084 kB <<<< +200g
NFS_Unstable: 0 kB NFS_Unstable: 0 kB
Bounce: 0 kB Bounce: 0 kB
WritebackTmp: 0 kB WritebackTmp: 0 kB
CommitLimit: 286999024 kB CommitLimit: 286999024 kB
Committed_AS: 289703328 kB Committed_AS: 291139212 kB
VmallocTotal: 34359738367 kB VmallocTotal: 34359738367 kB
VmallocUsed: 1739320 kB VmallocUsed: 1739320 kB
VmallocChunk: 33956569304 kB VmallocChunk: 33956569304 kB
HardwareCorrupted: 0 kB HardwareCorrupted: 0 kB
AnonHugePages: 2150400 kB AnonHugePages: 2072576 kB
HugePages_Total: 9999 HugePages_Total: 9999
HugePages_Free: 307 HugePages_Free: 307
HugePages_Rsvd: 293 HugePages_Rsvd: 293
HugePages_Surp: 0 HugePages_Surp: 0
Hugepagesize: 2048 kB Hugepagesize: 2048 kB
DirectMap4k: 65536 kB DirectMap4k: 65536 kB
DirectMap2M: 1761280 kB DirectMap2M: 1761280 kB
DirectMap1G: 534773760 kB
内核参数
[root@anbob2 oswmeminfo]# ll /etc/sysctl.conf -rw-r--r-- 1 root root 1437 Sep 5 2018 /etc/sysctl.conf [root@anbob2 oswmeminfo]# grep -i huge /etc/sysctl.conf vm.nr_hugepages = 153600 [root@anbob2 oswmeminfo]# sysctl -a|grep -i huge vm.nr_hugepages = 9999 <<<<<<<<<<<<<<<<<< NODE2 没生效 vm.nr_hugepages_mempolicy = 9999 vm.hugetlb_shm_group = 0 vm.hugepages_treat_as_movable = 0 vm.nr_overcommit_hugepages = 0 [root@anbob1 oswmeminfo]# grep -i huge /etc/sysctl.conf vm.nr_hugepages = 153600 <<<<<<<<<<<<<<<<<< NODE1 [root@anbob1 oswmeminfo]# sysctl -a|grep -i huge vm.nr_hugepages = 153600 vm.nr_hugepages_mempolicy = 153600 vm.hugetlb_shm_group = 0 vm.hugepages_treat_as_movable = 0 vm.nr_overcommit_hugepages = 0 [root@anbob1 oswmeminfo]# ps -ef|grep smon root 37937 17846 0 17:16 pts/1 00:00:00 grep smon oracle 59711 1 0 Jul11 ? 00:02:11 ora_smon_IMSP
变更时间
$ egrep "^zzz|HugePages_Total" oswmeminfo ugePages_Total: 122625 zzz ***Sat Jul 8 10:12:30 CST 2023 HugePages_Total: 122625 zzz ***Sat Jul 8 10:12:41 CST 2023 HugePages_Total: 122625 zzz ***Sat Jul 8 10:12:51 CST 2023 HugePages_Total: 122625 zzz ***Sat Jul 8 10:13:01 CST 2023 HugePages_Total: 122625 zzz ***Sat Jul 8 10:13:12 CST 2023 HugePages_Total: 122625 zzz ***Sat Jul 8 10:13:22 CST 2023 HugePages_Total: 122625 zzz ***Sat Jul 8 10:13:32 CST 2023 HugePages_Total: 122625 zzz ***Sat Jul 8 10:13:42 CST 2023 <<<<<<<<<<<<<< hugepage size 变更 , HugePages_Total: 9999 zzz ***Sat Jul 8 10:13:52 CST 2023 HugePages_Total: 9999 zzz ***Sat Jul 8 10:14:03 CST 2023 HugePages_Total: 9999 zzz ***Sat Jul 8 10:14:13 CST 2023 HugePages_Total: 9999 zzz ***Sat Jul 8 10:14:23 CST 2023
INSTANCE启动日志
2023-07-08 10:14:46.161000 +08:00 Starting ORACLE instance (normal) ************************ Large Pages Information ******************* Per process system memlock (soft) limit = 313 GB Total Shared Global Region in Large Pages = 240 GB (100%) Large Pages used by this instance: 122625 (240 GB) <<<<<<<<<<<<<<<<<<<<<<<< 200 Large Pages unused system wide = 30975 (60 GB) Large Pages configured system wide = 153600 (300 GB) Large Page size = 2048 KB ******************************************************************** LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Initial number of CPU is 96 Number of processor cores in the system is 48 Number of processor sockets in the system is 4 Picked latch-free SCN scheme 3 2023-07-08 10:28:37.020000 +08:00 Starting ORACLE instance (normal) ************************ Large Pages Information ******************* Per process system memlock (soft) limit = 313 GB Total Shared Global Region in Large Pages = 20 GB (8%) Large Pages used by this instance: 9985 (20 GB) <<<<<<<<<<<<<<<<<<<<<<<< 201 Large Pages unused system wide = 14 (28 MB) Large Pages configured system wide = 9999 (20 GB) Large Page size = 2048 KB RECOMMENDATION: Total System Global Area size is 240 GB. For optimal performance, prior to the next instance restart: 1. Increase the number of unused large pages by at least 112626 (page size 2048 KB, total size 220 GB) system wide to get 100% of the System Global Area allocated with large pages ******************************************************************** Instance shutdown complete 2023-07-08 10:44:25.617000 +08:00 Starting ORACLE instance (normal) ************************ Large Pages Information ******************* Per process system memlock (soft) limit = 313 GB Total Shared Global Region in Large Pages = 20 GB (8%) <<<<<<<<<<<<<<<<<<<<<<<< 201 Large Pages used by this instance: 9985 (20 GB) Large Pages unused system wide = 14 (28 MB) Large Pages configured system wide = 9999 (20 GB) Large Page size = 2048 KB RECOMMENDATION: Total System Global Area size is 240 GB. For optimal performance, prior to the next instance restart: 1. Increase the number of unused large pages by at least 112626 (page size 2048 KB, total size 220 GB) system wide to get 100% of the System Global Area allocated with large pages 2023-07-11 13:29:38.507000 +08:00 Starting ORACLE instance (normal) ************************ Large Pages Information ******************* Per process system memlock (soft) limit = 313 GB Total Shared Global Region in Large Pages = 240 GB (100%) <<<<<<<<<<<<<<<<<<<<<<<< 200 Large Pages used by this instance: 122625 (240 GB) Large Pages unused system wide = 30975 (60 GB) Large Pages configured system wide = 153600 (300 GB) Large Page size = 2048 KB ******************************************************************** LICENSE_MAX_SESSION = 0 LICENSE_SESSIONS_WARNING = 0 Initial number of CPU is 96 Number of processor cores in the system is 48 Number of processor sockets in the
Note:
OOM节点OS sysctl.conf参数文件大页配置153600,但内存级当前为9999,OS有900多天未重启,db instance最近重启使用了仅20G大页(8%),切到另一节点可以100%使用. 可能是从上实例重启后有人sysctl -w 调过内存级参数,缩小了vm.nr_hugepages, 只最最近DB INSTANCE重启才真正释放,具体调整时间未知.
zzz ***Tue Jul 11 13:03:20 CST 2023 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 12 4 932884 1076188 4748 224322464 0 0 1026 75 0 0 1 0 98 0 0 11 1 932884 1017352 4744 224328432 0 0 4452 11952 80532 49144 14 5 81 1 0 10 1 932880 1016912 4740 224333232 0 0 4472 13153 84146 52376 13 3 83 1 0 zzz ***Tue Jul 11 13:03:31 CST 2023 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 17 1 932880 849220 5612 224316912 0 0 1026 75 0 0 1 0 98 0 0 14 0 932880 779512 5612 224323408 0 0 4364 12692 86111 55005 14 5 81 1 0 19 1 932880 786308 5612 224328480 0 0 4148 12529 65594 47201 12 3 84 1 0 zzz ***Tue Jul 11 13:03:41 CST 2023 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 249 3 932880 762192 3492 224164800 0 0 1026 75 0 0 1 0 98 0 0 271 2 932880 666988 3744 224165296 0 0 428 144 122169 19786 3 77 19 2 0 103 1 1128640 673972 5516 223528912 4720 197628 53808 848077 33375721 1517708 0 100 0 0 0 <<<<< sys cpu 100% 《 《《《《《《《《《《《《《《 snap gap zzz ***Tue Jul 11 13:08:51 CST 2023 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 79 0 2191116 676560 5716 221857248 0 0 1026 75 0 0 1 0 98 0 0 246 3 2241572 678388 5772 221805008 12 49240 572 49585 126798 6238 1 98 1 0 0 194 0 2277520 669800 5676 221775456 0 35952 84 45364 147335 5038 0 98 1 0 0 zzz ***Tue Jul 11 13:09:11 CST 2023 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 69 0 3729636 793628 3980 220292544 0 0 1026 75 0 0 1 0 98 0 0 67 0 3730048 769428 4068 220292656 0 412 252 777 108784 30835 1 66 33 0 0 67 0 3730184 770468 4068 220292864 192 164 300 741 107760 32892 1 64 35 0 0 zzz ***Tue Jul 11 13:09:29 CST 2023 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 255 4 3791552 663928 4728 220244272 0 0 1026 75 0 0 1 0 98 0 0 263 0 3796552 659472 4728 220239904 0 4996 1180 5012 127310 2275 0 100 0 0 0 261 0 3799912 661764 4728 220236400 0 3364 0 3364 132814 2372 0 100 0 0 0 zzz ***Tue Jul 11 13:12:04 CST 2023 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 194 0 4598176 679972 3988 219445824 0 0 1026 75 0 0 1 0 98 0 0 170 0 4627960 668200 4008 219415520 48 29792 200 29856 118756 10679 1 97 2 0 0 127 0 4664620 672196 4008 219379904 0 36660 140 38193 129276 24727 2 95 3 0 0 zzz ***Tue Jul 11 13:12:20 CST 2023 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 99 3 5653004 774452 3168 218391728 0 0 1026 75 0 0 1 0 98 0 0 100 6 5770224 659328 3096 218280864 688 118168 1512 140937 152119 52093 6 60 33 1 0 110 0 5884952 682720 3128 218162976 128 113932 2460 115510 142064 25756 2 90 8 1 0 zzz ***Tue Jul 11 13:12:38 CST 2023 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu----- r b swpd free buff cache si so bi bo in cs us sy id wa st 58 0 6345164 756684 2168 217698096 0 0 1026 75 0 0 1 0 98 0 0 51 1 6363432 734976 2164 217677440 0 17248 680 17392 100229 12689 1 53 46 1 0 50 1 6372108 739220 2164 217669120 0 8676 932 8713 95133 11688 0 51 48 1 0 # iostat zzz ***Tue Jul 11 13:03:41 CST 2023 avg-cpu: %user %nice %system %iowait %steal %idle 0.11 0.00 99.74 0.04 0.00 0.11 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 5.30 719.39 8.10 19.62 182.40 2954.20 226.33 0.25 8.95 2.82 11.48 0.93 2.57 sdi 0.00 0.00 0.14 0.04 0.91 0.87 19.50 0.00 0.90 1.05 0.42 0.90 0.02 .. dm-0 0.00 0.00 9.30 568.98 165.81 2275.93 8.44 11.08 19.13 2.74 19.40 0.01 0.76 dm-1 0.00 0.00 4.15 171.11 16.59 684.44 8.00 2.85 16.18 2.63 16.51 0.16 2.81 .. VxVM28000 0.00 0.00 0.82 0.20 5.71 3.98 19.07 0.00 1.54 0.67 5.07 1.29 0.13 VxVM28001 0.00 0.00 0.01 0.07 0.03 0.68 17.57 0.00 0.39 0.00 0.43 0.39 0.00 VxVM28002 0.00 0.00 0.01 0.01 0.08 0.08 13.71 0.00 0.29 0.25 0.33 0.29 0.00 VxVM28003 0.00 0.00 0.01 0.02 0.08 0.17 14.40 0.00 0.10 0.00 0.17 0.10 0.00 VxVM28004 0.00 0.00 0.01 0.01 0.08 0.08 13.71 0.00 0.14 0.00 0.33 0.14 0.00 zzz ***Tue Jul 11 13:08:51 CST 2023 avg-cpu: %user %nice %system %iowait %steal %idle 3.63 0.00 59.94 4.26 0.00 32.17 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 214.42 35776.92 212.50 14915.38 9988.46 203992.31 28.29 34.26 2.28 0.75 2.31 0.06 88.75 .. dm-0 0.00 0.00 154.81 20.19 8900.00 80.77 102.64 0.15 0.88 0.28 5.52 0.70 12.31 dm-1 0.00 0.00 272.12 50750.00 1088.46 203000.00 8.00 149.39 2.88 2.62 2.88 0.02 91.44 <<<<<<<< .. sdaj 0.00 0.00 0.00 1.92 0.00 84.62 88.00 0.00 0.50 0.00 0.50 0.50 0.10 VxVM28000 0.00 0.00 17.31 27.88 76.92 3844.23 173.53 0.03 0.64 0.56 0.69 0.43 1.92 VxVM28001 0.00 0.00 0.00 16.35 0.00 3651.92 446.82 0.01 0.88 0.00 0.88 0.59 0.96 VxVM28002 0.00 0.00 12.50 0.96 100.00 7.69 16.00 0.01 0.57 0.62 0.00 0.57 0.77 VxVM28003 0.00 0.00 1.92 0.00 15.38 0.00 16.00 0.01 7.00 7.00 0.00 7.00 1.35 VxVM28004 0.00 0.00 1.92 2.88 15.38 146.15 67.20 0.00 0.60 0.50 0.67 0.40 0.19 [root@anbob2 oswmeminfo]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sdb 8:16 0 500G 0 disk ├─sdb3 8:19 0 500G 0 part └─sdb8 8:24 0 500G 0 part sdc 8:32 0 500G 0 disk ├─sdc3 8:35 0 500G 0 part └─sdc8 8:40 0 500G 0 part sdd 8:48 0 500G 0 disk ├─sdd3 8:51 0 500G 0 part └─sdd8 8:56 0 500G 0 part sde 8:64 0 500G 0 disk ├─sde3 8:67 0 500G 0 part └─sde8 8:72 0 500G 0 part sda 8:0 0 3.7T 0 disk ├─sda1 8:1 0 524M 0 part /boot/efi ├─sda2 8:2 0 500M 0 part /boot └─sda3 8:3 0 3.7T 0 part ├─rootvg-rootlv (dm-0) 253:0 0 2T 0 lvm / └─rootvg-swaplv (dm-1) 253:1 0 31.3G 0 lvm [SWAP] sdf 8:80 0 500G 0 disk # mpstat zzz ***Tue Jul 11 13:03:41 CST 2023 Linux 2.6.32-696.el6.x86_64 (anbob2) 07/11/23 _x86_64_ (96 CPU) 13:03:41 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle 13:03:42 all 2.86 0.00 76.85 1.68 0.00 0.05 0.00 0.00 18.56 13:03:42 0 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 13:03:42 1 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 13:03:42 2 3.03 0.00 22.22 5.05 0.00 0.00 0.00 0.00 69.70 13:03:42 3 23.00 0.00 49.00 25.00 0.00 0.00 0.00 0.00 3.00 13:03:42 4 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 13:03:42 5 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 13:03:42 6 1.00 0.00 17.00 0.00 0.00 0.00 0.00 0.00 82.00 13:03:42 7 0.99 0.00 93.07 0.00 0.00 0.00 0.00 0.00 5.94 13:03:42 8 0.00 0.00 100.00 0.00 0.00 0.00 0.00 0.00 0.00 13:03:42 9 44.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 54.00
Note:
期间有OSW出现几分钟的断档,sys CPU 100%, OS或文件系统hang, 缓过来后有swap刷盘,前后出现在几个个进程的R积压。
OSW断档前后对比
zzz ***Tue Jul 11 13:03:41 CST 2023zzz ***Tue Jul 11 13:08:26 CST 2023 MemTotal: 528940012 kB MemTotal: 528940012 kB MemFree: 768460 kB MemFree: 664400 kB Buffers: 3492 kB Buffers: 4512 kB Cached: 224165076 kB Cached: 223746948 kB SwapCached: 14440 kB SwapCached: 42048 kB Active: 200373904 kB Active: 201907664 kB Inactive: 45618352 kB Inactive: 43839988 kB Active(anon): 199794668 kB Active(anon): 201438972 kB Inactive(anon): 45041400 kB Inactive(anon): 43372572 kB Active(file): 579236 kB Active(file): 468692 kB Inactive(file): 576952 kB Inactive(file): 467416 kB Unevictable: 2380312 kB Unevictable: 2380312 kB Mlocked: 558700 kB Mlocked: 558700 kB SwapTotal: 32767996 kB SwapTotal: 32767996 kB SwapFree: 31835116 kB SwapFree: 31647260 kB Dirty: 590572 kB Dirty: 270588 kB <<<<<<<<<<<<<<<<<<<<<<<<<<< Writeback: 4 kB Writeback: 4612 kB AnonPages: 24248696 kB AnonPages: 24398188 kB Mapped: 161099688 kB Mapped: 161100900 kB Shmem: 222851540 kB Shmem: 222664684 kB Slab: 2050540 kB Slab: 2036328 kB SReclaimable: 933644 kB SReclaimable: 917188 kB SUnreclaim: 1116896 kB SUnreclaim: 1119140 kB KernelStack: 163072 kB KernelStack: 165104 kB PageTables: 250403824 kB PageTables: 250721084 kB NFS_Unstable: 0 kB NFS_Unstable: 0 kB Bounce: 0 kB Bounce: 0 kB WritebackTmp: 0 kB WritebackTmp: 0 kB CommitLimit: 286999024 kB CommitLimit: 286999024 kB Committed_AS: 291078736 kB Committed_AS: 291359788 kB VmallocTotal: 34359738367 kB VmallocTotal: 34359738367 kB VmallocUsed: 1739320 kB VmallocUsed: 1739320 kB VmallocChunk: 33956569304 kB VmallocChunk: 33956569304 kB HardwareCorrupted: 0 kB HardwareCorrupted: 0 kB AnonHugePages: 2131968 kB AnonHugePages: 2131968 kB HugePages_Total: 9999 HugePages_Total: 9999 HugePages_Free: 307 HugePages_Free: 307 HugePages_Rsvd: 293 HugePages_Rsvd: 293 HugePages_Surp: 0 HugePages_Surp: 0 Hugepagesize: 2048 kB Hugepagesize: 2048 kB DirectMap4k: 65536 kB DirectMap4k: 65536 kB DirectMap2M: 1761280 kB DirectMap2M: 1761280 kB DirectMap1G: 534773760 kB DirectMap1G: 534773760 kB $ egrep "vm.swapp|vm.dirty_back|dirty_ratio|vm.dirty_expire|vm.dirty_write|vm.min_free|vm.vfs" sysctl_-a vm.swappiness = 20 vm.dirty_background_ratio = 10 vm.dirty_background_bytes = 0 vm.dirty_ratio = 20 vm.dirty_bytes = 0 vm.dirty_writeback_centisecs = 500 vm.dirty_expire_centisecs = 3000 vm.min_free_kbytes = 135168
建议设置增加页面缓出倾向
#are problem is things not swapping so let’s come up from 0
vm.swappiness=10#Maximum percentage of active memory that can have dirty pages the maximum percentage of ((Cache + Free) – Mapped) #memory that can be dirty before it is written to disk by the pdflush process
vm.dirty_background_ratio=3#Maximum percentage of total memory that can have dirty pages the ratio that represents the percentage of MemTotal that #can consume dirty pages before all processes must write dirty buffers back to disk and when this value is reached all #I/O is blocked for any new writes until dirty pages have been flushed
vm.dirty_ratio=15#How long data can be in page cache before being expired (hundreths of second)
vm.dirty_expire_centisecs=500#How often pdflush is activated to clean dirty pages (hundreths of a second)
vm.dirty_writeback_centisecs=100
OS LOG
Jul 11 12:15:25 szimsdb2 kernel: INFO: task ps:54176 blocked for more than 120 seconds. Jul 11 12:15:25 szimsdb2 kernel: Tainted: P -- ------------ 2.6.32-696.el6.x86_64 #1 Jul 11 12:15:25 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 11 12:15:25 szimsdb2 kernel: ps D 000000000000003b 0 54176 54159 0x00000080 Jul 11 12:15:25 szimsdb2 kernel: ffff886d0ae63c68 0000000000000086 0000000000000000 ffffffff8123b0b6 Jul 11 12:15:25 szimsdb2 kernel: ffff886d0ae63be8 ffff886d0ae63e08 0131d140e4e0f731 000000000ae63cd8 Jul 11 12:15:25 szimsdb2 kernel: ffffffff8100bc0e 000000150d00bb8b ffff886556cf65f8 ffff886d0ae63fd8 Jul 11 12:15:25 szimsdb2 kernel: Call Trace: Jul 11 12:15:25 szimsdb2 kernel: [] ? security_task_to_inode+0x16/0x20 Jul 11 12:15:25 szimsdb2 kernel: [] ? apic_timer_interrupt+0xe/0x20 Jul 11 12:15:25 szimsdb2 kernel: [] ? mutex_spin_on_owner+0x9b/0xc0 Jul 11 12:15:25 szimsdb2 kernel: [] __mutex_lock_slowpath+0x96/0x210 Jul 11 12:15:25 szimsdb2 kernel: [] mutex_lock+0x2b/0x50 Jul 11 12:15:25 szimsdb2 kernel: [] pipe_write+0x7e/0x6b0 Jul 11 12:15:25 szimsdb2 kernel: [] do_sync_write+0xfa/0x140 Jul 11 12:15:25 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40 Jul 11 12:15:25 szimsdb2 kernel: [] ? mntput_no_expire+0x30/0x110 Jul 11 12:15:25 szimsdb2 kernel: [] ? security_file_permission+0x16/0x20 Jul 11 12:15:25 szimsdb2 kernel: [] vfs_write+0xb8/0x1a0 Jul 11 12:15:25 szimsdb2 kernel: [] ? fget_light_pos+0x16/0x50 Jul 11 12:15:25 szimsdb2 kernel: [] sys_write+0x51/0xb0 Jul 11 12:15:25 szimsdb2 kernel: [] ? __audit_syscall_exit+0x25e/0x290 Jul 11 12:15:25 szimsdb2 kernel: [] system_call_fastpath+0x16/0x1b Jul 11 13:07:28 szimsdb2 kernel: INFO: task jbd2/dm-0-8:4344 blocked for more than 120 seconds. Jul 11 13:07:35 szimsdb2 kernel: Tainted: P -- ------------ 2.6.32-696.el6.x86_64 #1 Jul 11 13:07:38 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 11 13:07:38 szimsdb2 kernel: jbd2/dm-0-8 D 0000000000000009 0 4344 2 0x00000000 Jul 11 13:07:38 szimsdb2 kernel: ffff88404e61fd20 0000000000000046 ffff88404e61fce8 ffff88404e61fce4 Jul 11 13:07:38 szimsdb2 kernel: ffff88404eec8000 ffff88207fe84a00 0131d4006805a0db ffff8820f0dd6ec0 Jul 11 13:07:38 szimsdb2 kernel: 00000000000003e7 000000150d2edc86 ffff88404f9b3068 ffff88404e61ffd8 Jul 11 13:07:38 szimsdb2 kernel: Call Trace: Jul 11 13:07:38 szimsdb2 kernel: [] jbd2_journal_commit_transaction+0x19f/0x14f0 [jbd2] Jul 11 13:07:38 szimsdb2 kernel: [] ? lock_timer_base+0x3c/0x70 Jul 11 13:07:38 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40 Jul 11 13:07:38 szimsdb2 kernel: [] kjournald2+0xb8/0x220 [jbd2] Jul 11 13:07:38 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40 Jul 11 13:07:38 szimsdb2 kernel: [] ? kjournald2+0x0/0x220 [jbd2] Jul 11 13:07:38 szimsdb2 kernel: [] kthread+0x9e/0xc0 Jul 11 13:07:38 szimsdb2 kernel: [] child_rip+0xa/0x20 Jul 11 13:07:38 szimsdb2 kernel: [] ? kthread+0x0/0xc0 Jul 11 13:07:38 szimsdb2 kernel: [] ? child_rip+0x0/0x20 Jul 11 13:07:38 szimsdb2 kernel: INFO: task vxdclid:35809 blocked for more than 120 seconds. Jul 11 13:07:38 szimsdb2 kernel: Tainted: P -- ------------ 2.6.32-696.el6.x86_64 #1 Jul 11 13:07:38 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 11 13:07:38 szimsdb2 kernel: vxdclid D 0000000000000004 0 35809 1 0x00000080 Jul 11 13:07:38 szimsdb2 kernel: ffff88210340bbe8 0000000000000086 ffff88210340bb88 ffffffff8117fa68 Jul 11 13:07:38 szimsdb2 kernel: 0000000000000000 ffff8821350d0000 0004125000000000 ffff884050364160 Jul 11 13:07:38 szimsdb2 kernel: ffff88204e71aa80 0000000000000002 ffff8821242785f8 ffff88210340bfd8 Jul 11 13:07:38 szimsdb2 kernel: Call Trace: Jul 11 13:07:38 szimsdb2 kernel: [] ? ____cache_alloc_node+0x108/0x160 Jul 11 13:07:38 szimsdb2 kernel: [] ? prepare_to_wait+0x4e/0x80 Jul 11 13:07:38 szimsdb2 kernel: [] start_this_handle+0x25a/0x480 [jbd2] Jul 11 13:07:38 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40 Jul 11 13:07:38 szimsdb2 kernel: [] jbd2_journal_start+0xb5/0x100 [jbd2] Jul 11 13:07:38 szimsdb2 kernel: [] ext4_journal_start_sb+0x56/0xe0 [ext4] Jul 11 13:07:38 szimsdb2 kernel: [] ext4_dirty_inode+0x2a/0x60 [ext4] Jul 11 13:07:38 szimsdb2 kernel: [] __mark_inode_dirty+0x3b/0x1c0 Jul 11 13:07:38 szimsdb2 kernel: [] touch_atime+0x195/0x1a0 Jul 11 13:07:38 szimsdb2 kernel: [] ext4_file_mmap+0x5d/0x60 [ext4] Jul 11 13:07:38 szimsdb2 kernel: [] mmap_region+0x400/0x5b0 Jul 11 13:07:38 szimsdb2 kernel: [] do_mmap_pgoff+0x335/0x380 Jul 11 13:07:38 szimsdb2 kernel: [] sys_mmap_pgoff+0x17a/0x340 Jul 11 13:07:38 szimsdb2 kernel: [] sys_mmap+0x29/0x30 Jul 11 13:07:38 szimsdb2 kernel: [] system_call_fastpath+0x16/0x1b Jul 11 13:07:38 szimsdb2 kernel: INFO: task MountAgent:17301 blocked for more than 120 seconds. Jul 11 13:07:38 szimsdb2 kernel: Tainted: P -- ------------ 2.6.32-696.el6.x86_64 #1 Jul 11 13:07:38 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 11 13:07:38 szimsdb2 kernel: MountAgent D 0000000000000023 0 17301 1 0x00000080 Jul 11 13:07:38 szimsdb2 kernel: ffff886d0ad6fdf8 0000000000000082 0000000000000101 ffff8840fbf578d0 Jul 11 13:07:38 szimsdb2 kernel: ffff886050ac02c0 ffffffff8123bd7f ffff886d0ad6fd78 ffff887c599bd080 Jul 11 13:07:38 szimsdb2 kernel: 0000000000000000 0000000000000024 ffff886d351025f8 ffff886d0ad6ffd8 Jul 11 13:07:40 szimsdb2 kernel: Call Trace: Jul 11 13:07:49 szimsdb2 kernel: [] ? security_inode_permission+0x1f/0x30 Jul 11 13:07:50 szimsdb2 kernel: [] ? do_filp_open+0x6ea/0xd20 Jul 11 13:07:50 szimsdb2 kernel: [] rwsem_down_failed_common+0x95/0x1d0 Jul 11 13:07:50 szimsdb2 kernel: [] rwsem_down_write_failed+0x23/0x30 Jul 11 13:07:50 szimsdb2 kernel: [] call_rwsem_down_write_failed+0x13/0x20 Jul 11 13:07:50 szimsdb2 kernel: [] ? down_write+0x32/0x40 Jul 11 13:07:50 szimsdb2 kernel: [] sys_mmap_pgoff+0x5b/0x340 Jul 11 13:07:51 szimsdb2 kernel: [] sys_mmap+0x29/0x30 Jul 11 13:07:54 szimsdb2 kernel: [] system_call_fastpath+0x16/0x1b Jul 11 13:07:56 szimsdb2 kernel: INFO: task MountAgent:17303 blocked for more than 120 seconds. Jul 11 13:08:00 szimsdb2 kernel: Tainted: P -- ------------ 2.6.32-696.el6.x86_64 #1 Jul 11 13:08:01 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 11 13:08:01 szimsdb2 kernel: MountAgent D 000000000000003f 0 17303 1 0x00000080 Jul 11 13:08:01 szimsdb2 kernel: ffff886d0ad73df8 0000000000000082 0000000000000101 ffff8840fbf578d0 Jul 11 13:08:02 szimsdb2 kernel: ffff886050ac02c0 ffffffff8123bd7f ffff886d0ad73d78 ffff887a64311b80 Jul 11 13:08:02 szimsdb2 kernel: 0000000000000000 0000000000000024 ffff886d11cf7ad8 ffff886d0ad73fd8 Jul 11 13:08:02 szimsdb2 kernel: Call Trace: Jul 11 13:08:02 szimsdb2 kernel: [] ? security_inode_permission+0x1f/0x30 Jul 11 13:08:02 szimsdb2 kernel: [] ? do_filp_open+0x6ea/0xd20 Jul 11 13:08:02 szimsdb2 kernel: [] rwsem_down_failed_common+0x95/0x1d0 Jul 11 13:08:02 szimsdb2 kernel: [] rwsem_down_write_failed+0x23/0x30 Jul 11 13:08:02 szimsdb2 kernel: [] call_rwsem_down_write_failed+0x13/0x20 Jul 11 13:08:02 szimsdb2 kernel: [] ? down_write+0x32/0x40 Jul 11 13:08:02 szimsdb2 kernel: [] sys_mmap_pgoff+0x5b/0x340 Jul 11 13:08:02 szimsdb2 kernel: [] sys_mmap+0x29/0x30 Jul 11 13:08:02 szimsdb2 kernel: [] system_call_fastpath+0x16/0x1b Jul 11 13:08:02 szimsdb2 kernel: INFO: task MountAgent:17304 blocked for more than 120 seconds. Jul 11 13:08:02 szimsdb2 kernel: Tainted: P -- ------------ 2.6.32-696.el6.x86_64 #1 Jul 11 13:08:02 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 11 13:08:02 szimsdb2 kernel: MountAgent D 000000000000005b 0 17304 1 0x00000080 Jul 11 13:08:02 szimsdb2 kernel: ffff886d0ad77df8 0000000000000082 0000000000000101 ffff8840fbf578d0 Jul 11 13:08:02 szimsdb2 kernel: ffff886050ac02c0 ffffffff8123bd7f ffff886d0ad77d78 ffff886207b10bc0 Jul 11 13:08:02 szimsdb2 kernel: 0000000000000000 0000000000000024 ffff886d11cf7068 ffff886d0ad77fd8 Jul 11 13:08:02 szimsdb2 kernel: Call Trace: Jul 11 13:08:02 szimsdb2 kernel: [] ? security_inode_permission+0x1f/0x30 Jul 11 13:08:02 szimsdb2 kernel: [] ? do_filp_open+0x6ea/0xd20 Jul 11 13:08:02 szimsdb2 kernel: [] rwsem_down_failed_common+0x95/0x1d0 Jul 11 13:08:02 szimsdb2 kernel: [] rwsem_down_write_failed+0x23/0x30 Jul 11 13:08:02 szimsdb2 kernel: [] call_rwsem_down_write_failed+0x13/0x20 Jul 11 13:08:02 szimsdb2 kernel: [] ? down_write+0x32/0x40 Jul 11 13:08:02 szimsdb2 kernel: [] sys_mmap_pgoff+0x5b/0x340 Jul 11 13:08:02 szimsdb2 kernel: [] sys_mmap+0x29/0x30 Jul 11 13:08:02 szimsdb2 kernel: [] system_call_fastpath+0x16/0x1b Jul 11 13:08:02 szimsdb2 kernel: INFO: task HostMonitor:17262 blocked for more than 120 seconds. Jul 11 13:08:02 szimsdb2 kernel: Tainted: P -- ------------ 2.6.32-696.el6.x86_64 #1 Jul 11 13:08:02 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 11 13:08:02 szimsdb2 kernel: HostMonitor D 000000000000001b 0 17262 1 0x00000080 Jul 11 13:08:02 szimsdb2 kernel: ffff880d51fdbb18 0000000000000082 ffff880d51fdbab8 ffffffff8117fa68 Jul 11 13:08:02 szimsdb2 kernel: 0000000000000000 ffff8800693f7000 0004125000000000 ffff88204e71aa60 Jul 11 13:08:02 szimsdb2 kernel: ffff88804f230880 0000000000000002 ffff88204a13a5f8 ffff880d51fdbfd8 Jul 11 13:08:02 szimsdb2 kernel: Call Trace: Jul 11 13:08:02 szimsdb2 kernel: [] ? ____cache_alloc_node+0x108/0x160 Jul 11 13:08:02 szimsdb2 kernel: [] ? prepare_to_wait+0x4e/0x80 Jul 11 13:08:02 szimsdb2 kernel: [] start_this_handle+0x25a/0x480 [jbd2] Jul 11 13:08:02 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40 Jul 11 13:08:02 szimsdb2 kernel: [] jbd2_journal_start+0xb5/0x100 [jbd2] Jul 11 13:08:02 szimsdb2 kernel: [] ext4_journal_start_sb+0x56/0xe0 [ext4] Jul 11 13:08:02 szimsdb2 kernel: [] ext4_dirty_inode+0x2a/0x60 [ext4] Jul 11 13:08:02 szimsdb2 kernel: [] __mark_inode_dirty+0x3b/0x1c0 Jul 11 13:08:02 szimsdb2 kernel: [] touch_atime+0x195/0x1a0 Jul 11 13:08:02 szimsdb2 kernel: [] generic_file_aio_read+0x380/0x700 Jul 11 13:08:02 szimsdb2 kernel: [] do_sync_read+0xfa/0x140 Jul 11 13:08:02 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40 Jul 11 13:08:02 szimsdb2 kernel: [] ? invalidate_interrupt1+0xe/0x20 Jul 11 13:08:02 szimsdb2 kernel: [] ? security_file_permission+0x16/0x20 Jul 11 13:08:02 szimsdb2 kernel: [] vfs_read+0xb5/0x1a0 Jul 11 13:08:02 szimsdb2 kernel: [] ? fget_light_pos+0x3f/0x50 Jul 11 13:08:02 szimsdb2 kernel: [] sys_read+0x51/0xb0 Jul 11 13:08:02 szimsdb2 kernel: [] ? __audit_syscall_exit+0x25e/0x290 Jul 11 13:08:02 szimsdb2 kernel: [] system_call_fastpath+0x16/0x1b Jul 11 13:08:02 szimsdb2 kernel: INFO: task HostMonitor:17278 blocked for more than 120 seconds. Jul 11 13:08:02 szimsdb2 kernel: Tainted: P -- ------------ 2.6.32-696.el6.x86_64 #1 Jul 11 13:08:02 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 11 13:08:02 szimsdb2 kernel: HostMonitor D 0000000000000031 0 17278 1 0x00000080 Jul 11 13:08:02 szimsdb2 kernel: ffff886d0ad2f998 0000000000000082 0000000000000282 0000000000000030 Jul 11 13:08:02 szimsdb2 kernel: ffff8820f0c16f28 ffff886080010e40 ffff886d0ad2f938 0000000000000002 Jul 11 13:08:02 szimsdb2 kernel: ffff882080021ba8 0000000000000003 ffff886d353c5068 ffff886d0ad2ffd8 Jul 11 13:08:02 szimsdb2 kernel: Call Trace: Jul 11 13:08:02 szimsdb2 kernel: [] ? prepare_to_wait+0x4e/0x80 Jul 11 13:08:02 szimsdb2 kernel: [] start_this_handle+0x25a/0x480 [jbd2] Jul 11 13:08:02 szimsdb2 kernel: [] ? cache_alloc_refill+0x15b/0x240 Jul 11 13:08:02 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40 Jul 11 13:08:02 szimsdb2 kernel: [] jbd2_journal_start+0xb5/0x100 [jbd2] Jul 11 13:08:02 szimsdb2 kernel: [] ext4_journal_start_sb+0x56/0xe0 [ext4] Jul 11 13:08:02 szimsdb2 kernel: [] ext4_dirty_inode+0x2a/0x60 [ext4] Jul 11 13:08:02 szimsdb2 kernel: [] __mark_inode_dirty+0x3b/0x1c0 Jul 11 13:08:02 szimsdb2 kernel: [] file_update_time+0xf2/0x170 Jul 11 13:08:02 szimsdb2 kernel: [] ? __sb_start_write+0x80/0x120 Jul 11 13:08:02 szimsdb2 kernel: [] ? wake_bit_function+0x0/0x50 Jul 11 13:08:02 szimsdb2 kernel: [] ? ext4_da_get_block_prep+0x0/0x380 [ext4] Jul 11 13:08:02 szimsdb2 kernel: [] __block_page_mkwrite+0x3b/0x140 Jul 11 13:08:02 szimsdb2 kernel: [] ext4_page_mkwrite+0x121/0x360 [ext4] Jul 11 13:08:02 szimsdb2 kernel: [] ? cpumask_next_and+0x29/0x50 Jul 11 13:08:02 szimsdb2 kernel: [] do_wp_page+0x640/0x920 Jul 11 13:08:02 szimsdb2 kernel: [] handle_pte_fault+0x2cd/0xb20 Jul 11 13:08:02 szimsdb2 kernel: [] ? try_to_wake_up+0x24e/0x3e0 Jul 11 13:08:02 szimsdb2 kernel: [] handle_mm_fault+0x2aa/0x3f0 Jul 11 13:08:02 szimsdb2 kernel: [] __do_page_fault+0x141/0x500 Jul 11 13:08:02 szimsdb2 kernel: [] do_page_fault+0x3e/0xa0 Jul 11 13:08:02 szimsdb2 kernel: [] page_fault+0x25/0x30 Jul 11 13:08:02 szimsdb2 kernel: INFO: task hekad-daemon:8274 blocked for more than 120 seconds. Jul 11 13:08:02 szimsdb2 kernel: Tainted: P -- ------------ 2.6.32-696.el6.x86_64 #1 Jul 11 13:08:02 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 11 13:08:02 szimsdb2 kernel: hekad-daemon D 000000000000000d 0 8274 1 0x00000080 Jul 11 13:08:02 szimsdb2 kernel: ffff886d0ae17a88 0000000000000086 ffff88804f9c5660 ffff8880526c7900 Jul 11 13:08:02 szimsdb2 kernel: ffff886d0ae179e8 ffffffff81014b39 ffff886d0ae17a38 ffffffff810b295f Jul 11 13:08:02 szimsdb2 kernel: ffff886d0ae17a08 0000000000000000 ffff88608022b068 ffff886d0ae17fd8 Jul 11 13:08:02 szimsdb2 kernel: Call Trace: Jul 11 13:08:02 szimsdb2 kernel: [] ? read_tsc+0x9/0x20 Jul 11 13:08:02 szimsdb2 kernel: [] ? ktime_get_ts+0xbf/0x100 Jul 11 13:08:02 szimsdb2 kernel: [] ? prepare_to_wait+0x4e/0x80 Jul 11 13:08:02 szimsdb2 kernel: [] start_this_handle+0x25a/0x480 [jbd2] Jul 11 13:08:02 szimsdb2 kernel: [] ? cache_alloc_refill+0x15b/0x240 Jul 11 13:08:02 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40 Jul 11 13:08:02 szimsdb2 kernel: [] jbd2_journal_start+0xb5/0x100 [jbd2] Jul 11 13:08:02 szimsdb2 kernel: [] ext4_journal_start_sb+0x56/0xe0 [ext4] Jul 11 13:08:02 szimsdb2 kernel: [] ext4_dirty_inode+0x2a/0x60 [ext4] Jul 11 13:08:02 szimsdb2 kernel: [] __mark_inode_dirty+0x3b/0x1c0 Jul 11 13:08:02 szimsdb2 kernel: [] file_update_time+0xf2/0x170 Jul 11 13:08:02 szimsdb2 kernel: [] __generic_file_aio_write+0x230/0x490 Jul 11 13:08:02 szimsdb2 kernel: [] generic_file_aio_write+0x88/0x100 Jul 11 13:08:02 szimsdb2 kernel: [] ext4_file_write+0x58/0x190 [ext4] Jul 11 13:08:02 szimsdb2 kernel: [] ? handle_mm_fault+0x2aa/0x3f0 Jul 11 13:08:02 szimsdb2 kernel: [] do_sync_write+0xfa/0x140 Jul 11 13:08:02 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40 Jul 11 13:08:02 szimsdb2 kernel: [] ? call_rcu+0xe/0x10 Jul 11 13:08:02 szimsdb2 kernel: [] ? d_free+0x3f/0x60 Jul 11 13:08:02 szimsdb2 kernel: [] ? apic_timer_interrupt+0xe/0x20 Jul 11 13:08:02 szimsdb2 kernel: [] ? do_sync_write+0x0/0x140 Jul 11 13:08:02 szimsdb2 kernel: [] vfs_write+0xb8/0x1a0 Jul 11 13:08:02 szimsdb2 kernel: [] ? fget_light_pos+0x3f/0x50 Jul 11 13:08:02 szimsdb2 kernel: [] sys_write+0x51/0xb0 Jul 11 13:08:02 szimsdb2 kernel: [] system_call_fastpath+0x16/0x1b Jul 11 13:08:02 szimsdb2 kernel: INFO: task rs:main Q:Reg:52457 blocked for more than 120 seconds. Jul 11 13:08:02 szimsdb2 kernel: Tainted: P -- ------------ 2.6.32-696.el6.x86_64 #1 Jul 11 13:08:02 szimsdb2 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 11 13:08:02 szimsdb2 kernel: rs:main Q:Reg D 0000000000000005 0 52457 1 0x00000080 Jul 11 13:08:02 szimsdb2 kernel: ffff88404f8ffa88 0000000000000086 ffff8821350d0498 0000000000000000 Jul 11 13:08:02 szimsdb2 kernel: ffff88404f8ffa08 ffff88405234ec00 ffff88404f8ffa58 ffffffffa019ee13 Jul 11 13:08:02 szimsdb2 kernel: ffff88404f8ffa38 0000000300000001 ffff8824bd7d3ad8 ffff88404f8fffd8 Jul 11 13:08:02 szimsdb2 kernel: Call Trace: Jul 11 13:08:02 szimsdb2 kernel: [] ? ext4_mark_inode_dirty+0x83/0x1d0 [ext4] Jul 11 13:08:02 szimsdb2 kernel: [] ? prepare_to_wait+0x4e/0x80 Jul 11 13:08:02 szimsdb2 kernel: [] start_this_handle+0x25a/0x480 [jbd2] Jul 11 13:08:02 szimsdb2 kernel: [] ? cache_alloc_refill+0x15b/0x240 Jul 11 13:08:02 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40 Jul 11 13:08:02 szimsdb2 kernel: [] jbd2_journal_start+0xb5/0x100 [jbd2] Jul 11 13:08:02 szimsdb2 kernel: [] ext4_journal_start_sb+0x56/0xe0 [ext4] Jul 11 13:08:02 szimsdb2 kernel: [] ext4_dirty_inode+0x2a/0x60 [ext4] Jul 11 13:08:02 szimsdb2 kernel: [] __mark_inode_dirty+0x3b/0x1c0 Jul 11 13:08:02 szimsdb2 kernel: [] file_update_time+0xf2/0x170 Jul 11 13:08:02 szimsdb2 kernel: [] __generic_file_aio_write+0x230/0x490 Jul 11 13:08:02 szimsdb2 kernel: [] generic_file_aio_write+0x88/0x100 Jul 11 13:08:02 szimsdb2 kernel: [] ext4_file_write+0x58/0x190 [ext4] Jul 11 13:08:02 szimsdb2 kernel: [] ? handle_mm_fault+0x2aa/0x3f0 Jul 11 13:08:02 szimsdb2 kernel: [] do_sync_write+0xfa/0x140 Jul 11 13:08:02 szimsdb2 kernel: [] ? perf_event_task_sched_out+0x2e/0x70 Jul 11 13:08:02 szimsdb2 kernel: [] ? autoremove_wake_function+0x0/0x40 Jul 11 13:08:02 szimsdb2 kernel: [] ? apic_timer_interrupt+0xe/0x20 Jul 11 13:08:02 szimsdb2 kernel: [] ? security_file_permission+0x16/0x20 Jul 11 13:08:02 szimsdb2 kernel: [] vfs_write+0xb8/0x1a0 Jul 11 13:08:02 szimsdb2 kernel: [] ? fget_light_pos+0x3f/0x50 Jul 11 13:08:02 szimsdb2 kernel: [] sys_write+0x51/0xb0 Jul 11 13:08:02 szimsdb2 kernel: [] system_call_fastpath+0x16/0x1b Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 16921 inactive 22 sec Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 16921 inactive 23 sec Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 16921 inactive 24 sec Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 16921 inactive 25 sec Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 16921 inactive 26 sec Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 16921 inactive 27 sec Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 16921 inactive 28 sec Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20057 Port h[GAB_USER_CLIENT (refcount 0)] process 16921 inactive 29 sec Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20058 Port h[GAB_USER_CLIENT (refcount 0)] process 16921: heartbeat failed, killing process Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20059 Port h[GAB_USER_CLIENT (refcount 0)] heartbeat interval 30000 msec. Statistics: Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 0 ~ 6000 msec: 590166154 Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 6000 ~ 12000 msec: 0 Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 12000 ~ 18000 msec: 0 Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 18000 ~ 24000 msec: 0 Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20129 Port h: heartbeats in 24000 ~ 30000 msec: 0 Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20088 System information: Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20089 number of cpu: 96 Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20090 physical memory: 528940012 K Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20091 free memory: 666056 K Jul 11 13:08:02 szimsdb2 kernel: GAB INFO V-15-1-20041 Port h: client process failure: killing process Jul 11 13:08:02 szimsdb2 kernel: GAB WARNING V-15-1-20035 Port h attempting to kill process due to client process failure Jul 11 13:08:05 szimsdb2 kernel: GAB WARNING V-15-1-20035 Port h attempting to kill process due to client process failure Jul 11 13:08:20 szimsdb2 kernel: GAB WARNING V-15-1-20035 Port h attempting to kill process due to client process failure Jul 11 13:08:26 szimsdb2 AgentFramework[17254]: VCS ERROR V-16-2-13027 Thread(140178848986880) Resource(vol_u01) - monitor procedure did not complete within the expected time. Jul 11 13:08:26 szimsdb2 AgentFramework[17253]: VCS ERROR V-16-2-13027 Thread(139838914823936) Resource(VCShm) - monitor procedure did not complete within the expected time. Jul 11 13:08:26 szimsdb2 abrt[29670]: Saved core dump of pid 16921 (/opt/VRTSvcs/bin/had) to /var/spool/abrt/ccpp-2023-07-11-13:08:26-16921 (24588288 bytes) Jul 11 13:08:26 szimsdb2 abrtd: Directory 'ccpp-2023-07-11-13:08:26-16921' creation detected Jul 11 13:08:27 szimsdb2 kernel: GAB WARNING V-15-1-20161 Port h client process killed, GAB will initiate regmon action syslog after 200 sec Jul 11 13:08:27 szimsdb2 kernel: GAB INFO V-15-1-20032 Port h closed Jul 11 13:08:27 szimsdb2 AgentFramework[17254]: VCS ERROR V-16-2-13120 Thread(140178960766752) Error receiving from the engine. Agent(Volume) is exiting. Jul 11 13:08:27 szimsdb2 AgentFramework[17253]: VCS ERROR V-16-2-13120 Thread(139839033902880) Error receiving from the engine. Agent(HostMonitor) is exiting. Jul 11 13:08:27 szimsdb2 AgentFramework[17247]: VCS ERROR V-16-2-13120 Thread(140346738489120) Error receiving from the engine. Agent(Mount) is exiting. Jul 11 13:08:27 szimsdb2 AgentFramework[17248]: VCS ERROR V-16-2-13120 Thread(140104745289504) Error receiving from the engine. Agent(NIC) is exiting. Jul 11 13:08:27 szimsdb2 AgentFramework[17252]: VCS ERROR V-16-2-13120 Thread(140492977714976) Error receiving from the engine. Agent(Oracle) is exiting. Jul 11 13:08:27 szimsdb2 AgentFramework[17250]: VCS ERROR V-16-2-13120 Thread(140163071121184) Error receiving from the engine. Agent(Netlsnr) is exiting. Jul 11 13:08:27 szimsdb2 AgentFramework[17246]: VCS ERROR V-16-2-13120 Thread(139647360829216) Error receiving from the engine. Agent(IP) is exiting. Jul 11 13:08:27 szimsdb2 hashadow[16930]: VCS ERROR V-16-1-11103 VCS exited. It will restart Jul 11 13:08:29 szimsdb2 kernel: AMF NOTICE V-292-1-67 Signal received while waiting for event on reaper 'VCSMountAgent'. Returning. Jul 11 13:08:29 szimsdb2 kernel: AMF NOTICE V-292-1-67 Signal received while waiting for event on reaper 'VCSNetlsnrAgent'. Returning. Jul 11 13:08:29 szimsdb2 kernel: AMF NOTICE V-292-1-67 Signal received while waiting for event on reaper 'VCSOracleAgent'. Returning. Jul 11 13:08:33 szimsdb2 abrtd: Package 'VRTSvcs' isn't signed with proper key Jul 11 13:08:33 szimsdb2 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2023-07-11-13:08:26-16921' exited with 1 Jul 11 13:08:33 szimsdb2 abrtd: Deleting problem directory '/var/spool/abrt/ccpp-2023-07-11-13:08:26-16921' Jul 11 13:08:37 szimsdb2 abrt[30169]: Saved core dump of pid 17243 (/opt/VRTSvcs/bin/Script51Agent) to /var/spool/abrt/ccpp-2023-07-11-13:08:31-17243 (15106048 bytes) Jul 11 13:08:37 szimsdb2 kernel: AMF NOTICE V-292-1-68 The reaper 'DiskGroup' removed. Returning. Jul 11 13:08:37 szimsdb2 abrtd: Directory 'ccpp-2023-07-11-13:08:31-17243' creation detected Jul 11 13:08:37 szimsdb2 abrtd: Package 'VRTSvcs' isn't signed with proper key Jul 11 13:08:37 szimsdb2 Had[30223]: VCS NOTICE V-16-1-53071 Diagnostics directory moved to /var/VRTSvcs/diag/had.1689052117, please check its contents and contact Veritas Technical Support Jul 11 13:08:37 szimsdb2 abrtd: 'post-create' on '/var/spool/abrt/ccpp-2023-07-11-13:08:31-17243' exited with 1 Jul 11 13:08:37 szimsdb2 abrtd: Deleting problem directory '/var/spool/abrt/ccpp-2023-07-11-13:08:31-17243' Jul 11 13:08:37 szimsdb2 Had[30225]: VCS NOTICE V-16-1-10619 'HAD' starting on: szimsdb2 Jul 11 13:08:37 szimsdb2 Had[30225]: VCS NOTICE V-16-1-10620 Waiting for local cluster configuration status Jul 11 13:08:37 szimsdb2 Had[30225]: VCS NOTICE V-16-1-10625 Local cluster configuration valid Jul 11 13:08:37 szimsdb2 Had[30225]: VCS NOTICE V-16-1-11034 Registering for cluster membership Jul 11 13:08:37 szimsdb2 Had[30225]: VCS NOTICE V-16-1-11035 Waiting for cluster membership Jul 11 13:08:41 szimsdb2 kernel: GAB INFO V-15-1-20036 Port h[GAB_USER_CLIENT (refcount 0)] gen 4eb71e membership 0-1 Jul 11 13:08:41 szimsdb2 Had[30225]: VCS INFO V-16-1-10077 Received new cluster membership Jul 11 13:08:41 szimsdb2 Had[30225]: VCS NOTICE V-16-1-10086 System (Node '0') is in Regular Membership - Membership: 0x3 Jul 11 13:08:41 szimsdb2 Had[30225]: VCS NOTICE V-16-1-10086 System szimsdb2 (Node '1') is in Regular Membership - Membership: 0x3 Jul 11 13:08:41 szimsdb2 Had[30225]: VCS NOTICE V-16-1-10075 Building from remote system Jul 11 13:08:42 szimsdb2 Had[30225]: VCS NOTICE V-16-1-10066 Entering RUNNING state Jul 11 13:08:42 szimsdb2 Had[30225]: VCS NOTICE V-16-1-50311 VCS Engine: running with security OFF Jul 11 13:08:42 szimsdb2 AgentFramework[30289]: VCS NOTICE V-16-1-53071 Diagnostics directory moved to /var/VRTSvcs/diag//agents/DiskGroup.1689052122, please check its contents and contact Veritas Technical Support Jul 11 13:11:23 szimsdb2 AgentFramework[30289]: VCS ERROR V-16-2-13027 Thread(139672226191104) Resource(dg_szimsdg) - monitor procedure did not complete within the expected time. Jul 11 13:11:23 szimsdb2 Had[30225]: VCS ERROR V-16-2-13027 (szimsdb2) Resource(dg_szimsdg) - monitor procedure did not complete within the expected time. Jul 11 13:11:23 szimsdb2 Had[30225]: VCS CRITICAL V-16-1-50086 CPU usage on szimsdb2 is 93% Jul 11 13:16:53 szimsdb2 Had[30225]: VCS CRITICAL V-16-1-50086 CPU usage on szimsdb2 is 93% Jul 11 13:20:55 szimsdb2 Had[30225]: VCS CRITICAL V-16-1-50086 CPU usage on szimsdb2 is 91% Jul 11 13:20:55 szimsdb2 Had[30225]: VCS CRITICAL V-16-1-50086 Swap usage on szimsdb2 is 96% Jul 11 13:22:50 szimsdb2 AgentFramework[30289]: VCS ERROR V-16-2-13027 Thread(139672227243776) Resource(dg_szimsdg) - monitor procedure did not complete within the expected time. Jul 11 13:22:51 szimsdb2 Had[30225]: VCS ERROR V-16-2-13027 (szimsdb2) Resource(dg_szimsdg) - monitor procedure did not complete within the expected time. Jul 11 13:25:50 anbob2 kernel: AMF WARNING V-292-1-44 AMF can no longer monitor DGoffline events. Notifying reapers. Jul 11 13:25:50 anbob2 kernel: AMF WARNING V-292-1-44 AMF can no longer monitor DGonline events. Notifying reapers. Jul 11 13:25:50 anbob2 imfd[22155]: IMFD ERROR V-292-2-3030 Function:oimf_getnotification from library:libusnp_vxnotify.so failed with error:Failed to read event from vxnotify. Possibly vxnotify process got killed, errno = 0 Jul 11 13:26:01 anbob2 Had[30225]: VCS ERROR V-16-2-13051 (anbob2) Agent(NIC) is exiting because another agent with process-id(30277) is already running for this type Jul 11 13:27:21 anbob2 kernel: oracle invoked oom-killer: gfp_mask=0x200d2, order=0, oom_adj=0, oom_score_adj=0 Jul 11 13:27:21 anbob2 kernel: oracle cpuset=/ mems_allowed=0-3 Jul 11 13:27:21 anbob2 kernel: Pid: 6148, comm: oracle Tainted: P -- ------------ 2.6.32-696.el6.x86_64 #1 Jul 11 13:27:21 anbob2 kernel: Call Trace: Jul 11 13:27:21 anbob2 kernel: [] ? dump_header+0x90/0x1b0 Jul 11 13:27:21 anbob2 kernel: [] ? security_real_capable_noaudit+0x3c/0x70 Jul 11 13:27:21 anbob2 kernel: [] ? oom_kill_process+0x82/0x2a0 Jul 11 13:27:21 anbob2 kernel: [] ? select_bad_process+0xe1/0x120 Jul 11 13:27:21 anbob2 kernel: [] ? out_of_memory+0x220/0x3c0 Jul 11 13:27:21 anbob2 kernel: [] ? __alloc_pages_nodemask+0x93c/0x950 Jul 11 13:27:21 anbob2 kernel: [] ? alloc_pages_current+0xaa/0x110 Jul 11 13:27:21 anbob2 kernel: [] ? __page_cache_alloc+0x87/0x90 Jul 11 13:27:21 anbob2 kernel: [] ? find_or_create_page+0x4f/0xb0 Jul 11 13:27:21 anbob2 kernel: [] ? vx_page_alloc+0x1d1/0xd00 [vxfs] Jul 11 13:27:21 anbob2 kernel: [] ? vx_read_ahead_detect+0x221/0x610 [vxfs] Jul 11 13:27:21 anbob2 kernel: [] ? vx_do_getpage+0x505/0x2490 [vxfs] Jul 11 13:27:21 anbob2 kernel: [] ? dev_hard_start_xmit+0x21c/0x490 Jul 11 13:27:21 anbob2 kernel: [] ? vx_rwsleep_rec_lock+0x7d/0x110 [vxfs] Jul 11 13:27:21 anbob2 kernel: [] ? vx_recsmp_trylock+0x1/0x20 [vxfs] Jul 11 13:27:21 anbob2 kernel: [] ? vx_iglock3+0xfb/0x110 [vxfs] Jul 11 13:27:21 anbob2 kernel: [] ? vx_getpage1+0x3f9/0x940 [vxfs] Jul 11 13:27:21 anbob2 kernel: [] ? wake_bit_function+0x0/0x50 Jul 11 13:27:21 anbob2 kernel: [] ? vx_fault+0x2c1/0x6c0 [vxfs] Jul 11 13:27:21 anbob2 kernel: [] ? autoremove_wake_function+0x16/0x40 Jul 11 13:27:21 anbob2 kernel: [] ? __wake_up_bit+0x31/0x40 Jul 11 13:27:21 anbob2 kernel: [] ? __do_fault+0x54/0x530 Jul 11 13:27:21 anbob2 kernel: [] ? handle_pte_fault+0xf7/0xb20 Jul 11 13:27:21 anbob2 kernel: [] ? sock_aio_read+0x1a1/0x1b0 Jul 11 13:27:21 anbob2 kernel: [] ? handle_mm_fault+0x2aa/0x3f0 Jul 11 13:27:21 anbob2 kernel: [] ? __do_page_fault+0x141/0x500 Jul 11 13:27:21 anbob2 kernel: [] ? security_file_permission+0x16/0x20 Jul 11 13:27:21 anbob2 kernel: [] ? do_page_fault+0x3e/0xa0 Jul 11 13:27:21 anbob2 kernel: [] ? page_fault+0x25/0x30 Jul 11 13:27:21 anbob2 kernel: Mem-Info: Jul 11 13:27:21 anbob2 kernel: Node 0 DMA per-cpu: ... Jul 11 13:27:21 anbob2 kernel: CPU 95: hi: 186, btch: 31 usd: 0 Jul 11 13:27:21 anbob2 kernel: active_anon:51373947 inactive_anon:3603515 isolated_anon:0 Jul 11 13:27:21 anbob2 kernel: active_file:270 inactive_file:77 isolated_file:610 Jul 11 13:27:21 anbob2 kernel: unevictable:594868 dirty:0 writeback:0 unstable:0 Jul 11 13:27:21 anbob2 kernel: free:164590 slab_reclaimable:212653 slab_unreclaimable:274347 Jul 11 13:27:21 anbob2 kernel: mapped:39970849 shmem:47754858 pagetables:69126167 bounce:0 Jul 11 13:27:21 anbob2 kernel: Node 0 DMA free:15744kB min:0kB low:0kB high:0kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15192kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes Jul 11 13:27:21 anbob2 kernel: lowmem_reserve[]: 0 1659 128919 128919 Jul 11 13:27:21 anbob2 kernel: Node 0 DMA32 free:509480kB min:432kB low:540kB high:648kB active_anon:12kB inactive_anon:32kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):8kB present:1698848kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:0kB slab_reclaimable:19004kB slab_unreclaimable:5456kB kernel_stack:0kB pagetables:56524kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:9 all_unreclaimable? no Jul 11 13:27:21 anbob2 kernel: lowmem_reserve[]: 0 0 127260 127260 Jul 11 13:27:21 anbob2 kernel: Node 0 Normal free:33252kB min:33284kB low:41604kB high:49924kB active_anon:41907544kB inactive_anon:3160236kB active_file:0kB inactive_file:0kB unevictable:48008kB isolated(anon):0kB isolated(file):1664kB present:130314240kB mlocked:29612kB dirty:0kB writeback:0kB mapped:33003860kB shmem:38857440kB slab_reclaimable:232004kB slab_unreclaimable:590868kB kernel_stack:35184kB pagetables:78692032kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:905 all_unreclaimable? no Jul 11 13:27:21 anbob2 kernel: lowmem_reserve[]: 0 0 0 0 Jul 11 13:27:21 anbob2 kernel: Node 1 Normal free:32980kB min:33812kB low:42264kB high:50716kB active_anon:47390260kB inactive_anon:3399420kB active_file:708kB inactive_file:76kB unevictable:237212kB isolated(anon):0kB isolated(file):0kB present:132382720kB mlocked:75736kB dirty:0kB writeback:0kB mapped:37530640kB shmem:43887520kB slab_reclaimable:204916kB slab_unreclaimable:155872kB kernel_stack:56608kB pagetables:63008228kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:1089 all_unreclaimable? yes Jul 11 13:27:21 anbob2 kernel: lowmem_reserve[]: 0 0 0 0 Jul 11 13:27:21 anbob2 kernel: Node 2 Normal free:33780kB min:33812kB low:42264kB high:50716kB active_anon:59330284kB inactive_anon:2476360kB active_file:172kB inactive_file:356kB unevictable:25228kB isolated(anon):0kB isolated(file):768kB present:132382720kB mlocked:17052kB dirty:0kB writeback:0kB mapped:46825112kB shmem:54991776kB slab_reclaimable:196720kB slab_unreclaimable:158944kB kernel_stack:32512kB pagetables:68487804kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:874 all_unreclaimable? yes Jul 11 13:27:21 anbob2 kernel: lowmem_reserve[]: 0 0 0 0 Jul 11 13:27:21 anbob2 kernel: Node 3 Normal free:33124kB min:33812kB low:42264kB high:50716kB active_anon:56867688kB inactive_anon:5378012kB active_file:380kB inactive_file:84kB unevictable:2069024kB isolated(anon):0kB isolated(file):0kB present:132382720kB mlocked:445680kB dirty:0kB writeback:0kB mapped:42523776kB shmem:53282696kB slab_reclaimable:197968kB slab_unreclaimable:186248kB kernel_stack:39648kB pagetables:66260080kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:625 all_unreclaimable? yes Jul 11 13:27:21 anbob2 kernel: lowmem_reserve[]: 0 0 0 0 Jul 11 13:27:21 anbob2 kernel: Node 0 DMA: 2*4kB 1*8kB 1*16kB 1*32kB 1*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 1*2048kB 3*4096kB = 15744kB Jul 11 13:27:21 anbob2 kernel: Node 0 DMA32: 522*4kB 132*8kB 56*16kB 137*32kB 263*64kB 153*128kB 89*256kB 67*512kB 20*1024kB 13*2048kB 88*4096kB = 509480kB Jul 11 13:27:21 anbob2 kernel: Node 0 Normal: 5587*4kB 693*8kB 179*16kB 16*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 33252kB Jul 11 13:27:21 anbob2 kernel: Node 1 Normal: 1567*4kB 505*8kB 325*16kB 186*32kB 138*64kB 5*128kB 2*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 32980kB Jul 11 13:27:21 anbob2 kernel: Node 2 Normal: 7751*4kB 5*8kB 1*16kB 7*32kB 9*64kB 7*128kB 2*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 33780kB Jul 11 13:27:21 anbob2 kernel: Node 3 Normal: 1233*4kB 406*8kB 599*16kB 322*32kB 61*64kB 7*128kB 1*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 33124kB Jul 11 13:27:21 anbob2 kernel: 49341655 total pagecache pages Jul 11 13:27:21 anbob2 kernel: 1548810 pages in swap cache Jul 11 13:27:21 anbob2 kernel: Swap cache stats: add 12245073, delete 10696263, find 4492224732/4492556451 Jul 11 13:27:21 anbob2 kernel: Free swap = 0kB Jul 11 13:27:21 anbob2 kernel: Total swap = 32767996kB Jul 11 13:27:21 anbob2 kernel: 134152191 pages RAM Jul 11 13:27:21 anbob2 kernel: 1917188 pages reserved Jul 11 13:27:21 anbob2 kernel: 1755985666 pages shared Jul 11 13:27:21 anbob2 kernel: 86997652 pages non-shared
Note:
可见开始PS进程都出现过D状态.和jbd2进程hang 过200秒,很可能会影响ext4文件系统无法写入。D状态进程并不多,也不是很持久,如果较多进程D状态,有必要分析存储层驱动问题(Server hang due to “block devices” with status “pending syncing” on driver layer)
— over —
对不起,这篇文章暂时关闭评论。