Oracle 12c/19c ADR trace dest disk busy (100%) when ‘ls’ trace files
最近遇到几次故障升级oracle 12c后,相同的硬件有几次instance crash同时伴有LGWR 核心进程N seconds not move现象,OSW中vmstat ‘B’列会伴有突然大量的blocked(通常是I/O)问题,mpstat/iostat 显示$ORACLE_BASE所在本地文件系统出现90-100% busy现象, ps 显示LGWR和一些FG进程同时在等待相同事OS Kernel function address。
环境Oracle 12C r2 2nodes-RAC on HPUX 11.31, 从12c开始确实从ADR trace目录中比11G前trace文件数有所增加,尤其还有些潜在BUG如《Oracle12c R2注意事项: 又一个BUG 生成大量的trace 含 kjshash() kjqghd()》中记录过1天产生近90万trace, 如文件系统inode使用率过,过多的小文件在同一目录中是会导致文件检索效率变差,Veritas建议将单个VxFS目录中的文件数限制为100,000个文件。
当前出现oracle ADR所在本地disk busy的inode使用并不高,前且统计在DB trace目录1600个文件时ls 也有近4分钟时间, ls同时会出现 disk busy 100%, tusc ls进程也基本是花在getdents调用上。当前是找出哪个进程导致阶段性disk busy? 如何解决disk busy?
racle@anbob2:/home/oracle> sar -d 1 3 HP-UX anbob2 B.11.31 U ia64 12/03/20 18:07:55 device %busy avque r+w/s blks/s avwait avserv 18:07:56 disk4 98.02 0.50 240 3784 0.00 4.11 # local_vg oracle lv disk5 0.00 0.50 1 16 0.00 0.21 disk3054 0.00 0.50 4 14 0.00 0.07 disk3022 0.00 0.50 20 1576 0.00 0.17
Glance查看
#按文件系统 Glance 12.02.008 14:51:35 anbob2 ia64 Current Avg High --------------------------------------------------------------------------------------------------------------- CPU Util S SU UW | 16% 18% 23% Disk Util F F | 97% 99% 100% Mem Util S SU UF F | 53% 53% 53% Networkil U UR | 51% 51% 51% --------------------------------------------------------------------------------------------------------------- IO BY FILE SYSTEM Users= 15 Idx File System Device Type Logl IO Phys IO --------------------------------------------------------------------------------------------------------------- 1 / /dev/vg00/lvol3 vxfs na/ na 0.1/ 1.0 2 /stand /dev/vg00/lvol1 vxfs na/ na 0.1/ 0.7 3 /var /dev/vg00/lvol8 vxfs na/ na 0.1/ 5.7 4 /usr /dev/vg00/lvol7 vxfs na/ na 0.5/ 0.8 5 /topteact /dev/.../lv_topteact vxfs na/ na 0.0/ 0.0 6 /tmp /dev/vg00/lvol6 vxfs na/ na 0.7/ 0.7 9 /ptfs /dev/.../fslv_ptfs vxfs na/ na 0.0/ 0.0 10 /patrol /dev/.../fslv_patrol vxfs na/ na 0.0/ 1.2 11 /oracle /dev/.../fslv_oracle vxfs na/ na 246.1/275.5 #按disk排序 glance u Glance 12.02.008 14:51:51 anbob2 ia64 Current Avg High --------------------------------------------------------------------------------------------------------------- CPU Util S SU UW | 20% 18% 23% Disk Util F F | 97% 99% 100% Mem Util S SU UF F | 53% 53% 53% Networkil U UR | 51% 51% 51% --------------------------------------------------------------------------------------------------------------- IO BY DISK Users= 15 Logl Phys Serv Idx Device Util Qlen KB/Sec IO IO Time --------------------------------------------------------------------------------------------------------------- 1 disk5_p2 0.0/0.1 0.0 13.0/ 108.2 na/ na 1.1/ 9.6 0.00 2 disk4 7.1/100 0.0 2087/ 2367 na/ na 243.4/277.3 4.03 3 disk3059 1.5/1.3 0.0 14155/ 14388 na/ na 123.0/121.6 0.45 4 disk3054 0.1/0.0 0.0 4.4/ 12.6 na/ na 3.2/ 3.8 0.58 5 disk3020 0.9/0.8 0.0 150.7/ 700.9 na/ na 17.6/ 38.3 0.54 # 按IO进程排序 glance o Glance 12.02.008 17:53:19 anbob2 ia64 Current Avg High --------------------------------------------------------------------------------------------------------------- CPU Util S SU UW | 14% 14% 14% Disk Util F F | 98% 99% 99% Mem Util S SU UF F | 54% 54% 54% Networkil U UR | 51% 51% 51% --------------------------------------------------------------------------------------------------------------- INTERESTING PROCESS THRESHOLD OPTIONS Display processes with resource usage: Current Thresholds: CPU Utilization > (1.0 % ) Disk I/O Rate > (1.0 IOs/sec) Resident Set Size > (20 Mbytes ) Virtual Set Size > (500 Mbytes ) User name = (all ) Program name = (all ) TTY path name = (all ) Use match logic (and/or) : (or ) Sort key (name/cpu/disk/rss): disk (disk ) Glance started or last reset: 12/03/2020 17:53:12 Current refresh interval : 5 seconds Glance 12.02.008 18:04:55 anbob2 ia64 Current Avg High --------------------------------------------------------------------------------------------------------------- CPU Util S SU UW | 11% 13% 18% Disk Util F F | 99% 78% 100% Mem Util S SU UF F | 54% 54% 54% Networkil U UR | 51% 51% 52% --------------------------------------------------------------------------------------------------------------- PROCESS LIST Users= 12 User CPU % Thrd Disk Memory Block Process Name PID Name (9600% max) Cnt IO rate RSS VSS On --------------------------------------------------------------------------------------------------------------- pmgreader 20491 itmuser 0.3 1 225.1 4.6mb 8.5mb IO <<<<<<< ora_lg00_tbc 6511 oracle 1.3 1 63.8 34.2mb 47.9mb SLEEP replicat 15014 oracle 0.1 4 20.7 81.6mb 135mb SLEEP oracletbcse2 9258 oracle 0.5 1 18.0 42.2mb 72.6mb PIPE oracletbcse2 15113 oracle 0.1 1 16.5 34.5mb 51.3mb PIPE
本地oracle_base在lv_oracle上/oracle ,下层为disk4, 2块物理机械盘, 所以在IOPS在240时 disk busy 超过了90%. 当然使用glance也找到了导致间断性disk busy的进程是本地部署的监控应用pmgreader在读oracle ADR trace目录中的alert日志。停止该监控后不再出现disk busy. 但是问题的原根本原因在为什么ls 1600多个文件的目录会出现这么大的性能影响?
去年在《Troubleshooting Slower IO Performance on Veritas for 11.2.0.4 compared 10gR2 on RAW device after RMAN migrate》案例中遇到过一个因为数据库迁移后出现的IO慢性能问题案例, 当时就是因为使用的是veritas的Vxfs, 是因为vxfs中的碎片太多导致。
当文件系统<2TB的文件系统,Vxfs默认使用1K block size,这也是oracle binary directory建议的块大小, VxFS 有点像oracle的segment 结构也是block > extents. 有个经验值当<8k的extents 占用total extents 5%时说明当前的VxFS是糟糕的。
oracle@anbob2:/home/oracle> df -o s /oracle/app/oracle/diag/rdbms/prianbob/anbob2/trace/ /oracle (/dev/localvg/fslv_oracle) : Free Extents by Size 1: 67999 2: 81451 4: 83073 8: 135547 16: 121548 32: 105307 64: 87917 128: 84479 256: 56530 512: 32626 1024: 15477 2048: 4959 4096: 715 8192: 79 16384: 3 32768: 3 65536: 1 131072: 0 262144: 1 524288: 1 1048576: 1 2097152: 2 4194304: 2 8388608: 1 16777216: 0 33554432: 0 67108864: 0 134217728: 0 268435456: 0 536870912: 0 1073741824: 0 2147483648: 0
检查文件系统碎片
oracle@anbob2:/home/oracle> vxfsstat -b /oracle buffer cache statistics 208512 Kbyte current 4221440 maximum 88821047312 lookups 95.42% hit rate 5501 sec recycle age [not limited by maximum] oracle@anbob2:/opt/VRTS/bin> ls -l /oracle/app/oracle/diag/rdbms/prianbob/anbob2/trace/alert*.log -rw-r----- 1 oracle asmadmin 18217303 Dec 10 16:31 /oracle/app/oracle/diag/rdbms/prianbob/anbob2/trace/alert_anbob2.log oracle@anbob2:/opt/VRTS/bin> ./fsmap -a /oracle/app/oracle/diag/rdbms/prianbob/anbob2/trace/alert*.log|grep -v Volume |awk '{print $4}'|sort |uniq -c| sort -nk2 1055 1024 76 2048 14 3072 14 4096 5 5120 4 6144 2 7168 15 8192 2 9216 3 11264 1 13312 1 15360 7 16384 1 22528 1 31744 3 32768 1 35840 1 65536 1 75776 1 81920 1 197632 1 206848 1 262144 1 312320 1 314368 1 393216 1 524288 1 676864 1 698368 1 779264 1 929792 2 1048576 1 1319936 3 2097152 1 3179520
如果这个目录过去可能因为oracle的bug出现过几十万的trace file,导致该目录的VxFS metadata很大,如果后来及时删了,但未整理碎片,可能metadata依旧不会回收那ls 时依旧会需要加载这么大的metadata(no sure),类似oracle 的Full table scan,但是oracle 有buffer cache可以减少物理读, VxFS 有吗? 后来咨询了HP原厂,有的那就是vxfs_bc_bufhwm,发现这个主机确实过去的集成商不知道出于什么原因调整过,而且调的参数值是不合理的,默认应该是0,表示自动调整,vxfs_bc_bufhwm 显示指定了值(下面是恢复后的默认值)。
oracle@anbob2:/home/oracle> kctune |grep vxfs vxfs_bc_bufhwm 0 0 Immed vxfs_ifree_timelag -1 -1 Immed
NAME vxfs_bc_bufhwm – VxFS buffer cache high water mark(determines the VxFS
buffer cache size)
SYNOPSIS
/usr/bin/kctune -s vxfs_bc_bufhwm={maximum size of buffer cache}
VALUE
Specify an integer value.
Minimum
6144 KB
Maximum
90% of kernel memory.
Default
0DESCRIPTION
VxFS maintains a buffer cache in the kernel for frequently accessed
file system metadata in addition to the HP-UX kernel buffer cache that
caches the file data. vxfs_bc_bufhwm tunable parameter determines the
size of the VxFS buffer cache (the maximum amount of memory that can
be used to cache VxFS metadata).
// The maximum size of the metadata buffer cache is set (auto-tuned) at boot time based on system memory size, provided that the value of vxfs_bc_bufhwm is set to zero (default).
Like with the tunable vx_ninode, a large metadata buffer cache can help improve file system performance, especially during metadata-intensive loads (stat, create, remove, link, lookup operations).NOTES
Use the vxfsstat command to monitor buffer cache statistics and inode cache usage. See the vxfsstat(1M) manual page.
Setting the vxfs_bc_bufhwm value too low can result in a system hang. Set the value of vxfs_bc_bufhwm to 5% or more of the system’s total physical memory if the system has 8 GB or less physical memory. Set the value of vxfs_bc_bufhwm to 2% or more of the system’s total physical memory if the system has more than 8 GB of physical memory. The higher the physical memory of the system, the lower you can set vxfs_bc_bufhwm. You can set vxfs_bc_bufhwm to as low as 0.5% of the system’s total physical memory if the system has much more than 8 GB of memory.
EXAMPLES
The following command sets the maximum of size of buffer cache at 300000:# kctune -s vxfs_bc_bufhwm=300000
WARNINGS
Incorrectly tunning a parameter may adversely affect system performance. See the Storage Foundation Administrator’s Guide for more information about tuning parameters.
VxFS kernel tunable parameters are release specific. This parameter may be removed or the default value and usage may change in future releases. See the Storage Foundation Release Notes for information about changes to parameters, if any.
解决方案
尝试调整内核参数
/usr/bin/kctune -s vxfs_bc_bufhwm=0 oracle@anbob2:/home/oracle> vxfsstat -b /oracle buffer cache statistics 688512 Kbyte current 4221440 maximum 88824047314 lookups 95.42% hit rate 5501 sec recycle age [not limited by maximum]
再1次ls 做物理读,但第2次就恢复了正常,秒出结果。
另外一种解决方法是把trace目录rename,新创建trace目录再把文件复制回原目录。注意需要实例停止情况下做。
对不起,这篇文章暂时关闭评论。