Troubleshooting Linux7 panic System crash shows exception RIP: pagetypeinfo_showfree_print
最近一套oracle RAC on Linux 7环境1节点操作系统重启,分析又是DB和CRS层无错误日志,还好OS有配置kdump, 生成了vmcore文件, 分析是在cat命令时触发操作系统panic, cpu 遭遇hard lockup,出现system crash. 调用堆栈显示exception RIP pagetypeinfo_showfree_print。
错误日志堆栈
crash> bt PID: 27901 TASK: ffff938a4d4f1fa0 CPU: 14 COMMAND: "cat" #0 [ffff9483bf488e48] crash_nmi_callback at ffffffffb8c551d7 #1 [ffff9483bf488e58] nmi_handle at ffffffffb931d8cc #2 [ffff9483bf488eb0] do_nmi at ffffffffb931dba8 #3 [ffff9483bf488ef0] end_repeat_nmi at ffffffffb931cd69 [exception RIP: pagetypeinfo_showfree_print+104] RIP: ffffffffb8db7173 RSP: ffff938b9fcbfda0 RFLAGS: 00000006 RAX: fffff0c9946d7020 RBX: ffff96073ffd5528 RCX: 0000000000000000 RDX: 00000000001c7764 RSI: ffffffffb9676ab1 RDI: 0000000000000000 RBP: ffff938b9fcbfdd0 R8: 000000000000000a R9: 00000000fffffffe R10: 0000000000000000 R11: ffff938b9fcbfc36 R12: ffff942b97758240 R13: ffffffffb942f730 R14: ffff96073ffd5000 R15: ffff96073ffd5180 ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 --- --- #4 [ffff938b9fcbfda0] pagetypeinfo_showfree_print at ffffffffb8db7173 #5 [ffff938b9fcbfdd8] walk_zones_in_node at ffffffffb8db74df #6 [ffff938b9fcbfe20] pagetypeinfo_show at ffffffffb8db7a29 #7 [ffff938b9fcbfe48] seq_read at ffffffffb8e45c3c #8 [ffff938b9fcbfeb8] proc_reg_read at ffffffffb8e95070 #9 [ffff938b9fcbfed8] vfs_read at ffffffffb8e1f2af #10 [ffff938b9fcbff08] sys_read at ffffffffb8e2017f #11 [ffff938b9fcbff50] system_call_fastpath at ffffffffb932579b
这个环境 linux 7.7,发现在Oracle linux和Red hat linux都存在该问题,因为分支不同的原因,命名的bug不同,升级的内核不同。
对于OEL 属于Bug 32921246 – [UEK-5-U5] Reading /proc/pagetypeinfo on large systems can cause lockup
对于RHEL 属于Bug 1757943 – Hard lockup in free_one_page()->_raw_spin_lock() because sosreport command is reading from /proc/pagetypeinfo
Oracle Linux 没有公开原因,只是在Doc ID 3000138.1 记录了UEK kernel 5存在该问题的现象, Red hat Linux 记录原因是cat /proc/pagetypeinfo 读取 free pages时遇到的循环调用walk_zones_in_node, 触发hard lockup的内核bug.
937 /* Print out the free pages at each order for each migatetype */ 938 static int pagetypeinfo_showfree(struct seq_file *m, void *arg) 939 { 940 int order; 941 pg_data_t *pgdat = (pg_data_t *)arg; 942 943 /* Print header */ 944 seq_printf(m, "%-43s ", "Free pages count per migrate type at order"); 945 for (order = 0; order < MAX_ORDER; ++order) 946 seq_printf(m, "%6d ", order); 947 seq_putc(m, '\n'); 948 949 walk_zones_in_node(m, pgdat, pagetypeinfo_showfree_print); <----- 950 951 return 0; 952 } 709 /* Walk all the zones in a node and print using a callback */ 710 static void walk_zones_in_node(struct seq_file *m, pg_data_t *pgdat, 711 void (*print)(struct seq_file *m, pg_data_t *, struct zone *)) { ... }
什么是/proc/pagetypeinfo
Linux系统中的/proc/pagetypeinfo条目提供有关内存页面分配和使用情况的信息。它可以深入了解正在使用的不同类型的内存页,例如活动的、非活动的、空闲的等等。这有助于系统管理员和开发人员了解系统的内存使用模式并优化性能。常用于分析内存碎片memory fragmentation, 通常还会和/proc/zoneinfo 、/proc/buddyinfo、 /proc/vmstat文件一起查看。现在的LINUX内核中,内存管理最大概念为node,
在node上再分为一个或者几个zone, 每个zone中又分为不同的迁移类型.pagetypeinfo
输出系统上各个zone
中的不同迁移类型的详细状态信息,其比/proc/buddyinfo
中的信息更加详细
The Linux Kernel splits its memory space in Zones (eg, for x86_64):
from https://github.com/netdata/netdata/issues/6802
DMA : @ 0 to 16MB, for legacy reasons DMA32 : @ 16MB to 4GB, for 32bits hardware Normal: @4GB to ..., the standard addressing. Each of these zones is split in pages of 2^10 (1MB for 4KB pagesize) by the buddyallocator. When a page is released, the allocator will try to merge it with its buddy to form a higher page. If all pages are low-level pages, it often denotes memory fragmentation. Most of the time, this is due to the kernel cache that uses unmovable pages. You can clean the most consumed (inode & dentries) by issuing a "echo 3 >/proc/sys/vm/drop_caches". The A "clean" (ie non-fragmented) machine will have high order pages (8, 9, 10) : odin [00:13:12][0][~] cat /proc/pagetypeinfo Page block order: 9 Pages per block: 512 Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 Node 0, zone DMA, type Unmovable 0 0 0 0 2 1 1 0 1 0 0 Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3 Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Unmovable 1 0 0 0 0 0 1 1 1 1 0 Node 0, zone DMA32, type Movable 2 1 2 0 1 3 2 1 2 1 593 Node 0, zone DMA32, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Unmovable 870 530 391 157 103 41 9 2 1 0 0 Node 0, zone Normal, type Movable 5886 9235 5728 4072 1561 324 115 41 12 4 13018 Node 0, zone Normal, type Reclaimable 3 4 8 11 2 3 1 1 1 0 0 Node 0, zone Normal, type HighAtomic 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0 while a more fragmented server will have mostly low-order pages: Page block order: 9 Pages per block: 512 Free pages count per migrate type at order 0 1 2 3 4 5 6 7 8 9 10 Node 0, zone DMA, type Unmovable 0 1 1 0 2 1 1 0 1 0 0 Node 0, zone DMA, type Reclaimable 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Movable 0 0 0 0 0 0 0 0 0 1 3 Node 0, zone DMA, type Reserve 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type CMA 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Unmovable 159 6 2 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Reclaimable 9 8271 6716 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Movable 589 8078 3128 9 0 0 0 0 0 0 0 Node 0, zone DMA32, type Reserve 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type CMA 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone DMA32, type Isolate 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Unmovable 1373 1465 1173 2 0 0 0 0 0 0 0 Node 0, zone Normal, type Reclaimable 14 5 13 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Movable 16256 80265 156907 529 67 0 0 0 0 0 0 Node 0, zone Normal, type Reserve 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone Normal, type CMA 0 0 0 0 0 0 0 0 0 0 0 Node 0, zone Normal, type Isolate 0 0 0 0 0 0 0 0 0 0 0
If you want more details, you can see my memory management presentation
解决方法
Red Hat Enterprise Linux 7
- Upgrade to
kernel-3.10.0-1127.el7
from Errata RHSA-2020:1016 or later
Red Hat Enterprise Linux 7.7
- Upgrade to
kernel-3.10.0-1062.12.1.el7
from Errata RHSA-2020:0374 or later
Red Hat Enterprise Linux 7.6
- Upgrade to
kernel-3.10.0-957.43.1.el7
from Errata RHSA-2020:0179 or later
Red Hat Enterprise Linux 7.5
- Upgrade to
kernel-3.10.0-862.46.1.el7
from Errata RHSA-2020:0036 or later
— or —
Oracle Linux UEK
This bug is fixed in “V4.14.35-2047.505.1” and above.
More about Oracle Linux and Unbreakable Enterprise Kernel (UEK) Releases
References
Oracle Linux: CPU Hard Lockup Detected in get_page_from_freelist() Call of UEK5 Kernel (Doc ID 3000138.1)
https://access.redhat.com/solutions/4588841
对不起,这篇文章暂时关闭评论。