说说 ora.crf(CHM) 那些事
Oracle数据库环境尤其是RAC环境对下层的基础环境要求非常严格,常常会因为CPU不足,内存不足、网络,IO等原因导致数据库hang或脑裂驱逐, 这里如果没有系统信息数据的支撑, 可能会陷入SA和DBA互相扯皮的尴尬局面, Oracle也是操碎了心,引入了个工具CHM, 这也是Oracle安装介质越来越大的一个原因, 把这些工具也集成了进来,到12C后甚至还搞了个独立的库(GIMR)收信信息,还有SQL developer, SQLCL , ORATOP 等工具, 这也是正是她伟大的一面.
ora.crf是Cluster Health Monitor(以下简称CHM)提供服务的资源, 用来自动收集操作系统(CPU、内存、SWAP、进程、I/O以及网络等)的使用情况。 当然还是建议生产环境部署OSW来收集记录更久的信息, OSW是调用的OS 命令,而CHM 调用的是OS API,开销低、实时性强。 开始的版本是每秒一次,从11.2.0.3应该是变为每5秒一次。
CHM会自动安装在下面的软件:
- 11.2.0.2 及更高版本的 Oracle Grid Infrastructure for Linux (不包括Linux Itanium) 、Solaris (Sparc 64 和 x86-64)
- 11.2.0.3 及更高版本 Oracle Grid Infrastructure for AIX 、 Windows (不包括Windows Itanium)。
之前的版本如果想安装CHM,需要独立安装。也可以安装在非RAC环境。
CHM主要包括两个服务:
1). System Monitor Service(osysmond):这个服务在所有节点都会运行,osysmond会将每个节点的资源使用情况发送给cluster logger service,后者将会把所有节点的信息都接收并保存到CHM的资料库。
$ ps -ef|grep osysmond root 7984 1 0 Jun05 ? 01:16:14 /u01/app/11.2.0/grid/bin/osysmond.bin
2). Cluster Logger Service(ologgerd):在一个集群中的,ologgerd 会有一个主机点(master),还有一个备节点(standby)。当ologgerd在当前的节点遇到问题无法启动后,它会在备用节点启用。
主节点: $ ps -ef|grep ologgerd root 8257 1 0 Jun05 ? 00:38:26 /u01/app/11.2.0/grid/bin/ologgerd -M -d /u01/app/11.2.0/grid/crf/db/rac2 备节点: $ ps -ef|grep ologgerd root 8353 1 0 Jun05 ? 00:18:47 /u01/app/11.2.0/grid/bin/ologgerd -m rac2 -r -d /u01/app/11.2.0/grid/crf/db/rac1
CHM诊断日志:
$GRID_HOME/log/*/crflogd/crflogd.log $GRID_HOME/log/*/crfmond/crfmond.log
CHM Repository:
用于存放收集到数据,默认情况下,会存在于$GI_HOME/crf下 ,需要1 GB 的磁盘空间, 每个节点大约每天会占用0.5GB的空间, 您可以使用OCLUMON来调整它的存放路径以及允许的空间大小(最多只能保存3天的数据)
获得CHM生成的数据的方法有两种:
1. 一种是使用Grid_home/bin/diagcollection.pl
$/bin/diagcollection.pl -collect -chmos -incidenttime inc_time -incidentduration duration e.g. $ diagcollection.pl -collect -crshome /u01/app/11.2.0/grid -chmoshome /u01/app/11.2.0/grid -chmos -incidenttime "06/15/201412:30:00" -incidentduration "00:05"
2. 另外一种获得CHM生成的数据的方法为oclumon
$oclumon dumpnodeview [[-allnodes] | [-n node1 node2] [-last "duration"] | [-s "time_stamp" -e "time_stamp"] [-v] [-warning]] [-h] e.g. $ oclumon dumpnodeview -allnodes -v -s "2012-06-15 07:40:00" -e "2012-06-15 07:57:00" > /tmp/chm1.txt Using oclumon to detect potential root causes for node evictions ( CPU starvation ) $ oclumon dumpnodeview -n grac2 -last "00:15:00"
停止和禁用 ora.crf resource.
On each node, as root user: # /bin/crsctl stop res ora.crf -init # /bin/crsctl modify res ora.crf -attr ENABLED=0 -init
第一个问题
因为CHM Repository文件高大导致GI_HOME文件系统使用率高或ologgerd进程CPU使用高 几乎100%. 通常每个节点大约每天会占用0.5GB的空间。
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 824 root RT -5 368m 142m 58m R 99.6 0.1 1:01.77 ologgerd # cd $GI_HOME/crf/db/ # ls -lstr *.bdb
像我本次处理的案例crfclust.bdb 文件达到了37GB.
清理方法:
A,手动清理 1. Stop CRF, as root user. # $GI_HOME/bin/crsctl stop res ora.crf -init 2. Take backup or remove bdb files. # cd $GI_HOME/crf/db/ # mv *.bdb *.bdb.backup 3. Start CRF, as root user. # $GI_HOME/bin/crsctl start res ora.crf -init B,11.2.0.3修改一种使用oclumon manage -repos 可以控制大小释放空间。 下面的命令用来查看它当前设置: $ oclumon manage -get reppath CHM Repository Path = /u01/app/11.2.0/grid/crf/db/rac2 Done $ oclumon manage -get repsize CHM Repository Size = 259200 <====单位为秒 Done 有时会得到一个非常大的值.那是有问题的,如 $ oclumon manage -get repsize CHM Repository Size = 1094795585 修改路径: $ oclumon manage -repos reploc /shared/oracle/chm 修改大小: $ oclumon manage -repos resize 259200 # Restart crf on both nodes $ crsctl stop res ora.crf -init $ crsctl start res ora.crf -init .bdb文件也会重新初始化。 C, 另一种临时解决方法,可以kill ologgerd进程,清理目录,osysmond会自动respawn ologgerd和生成新的bdb文件。 Bug 20186278 crfclust.bdb Becomes Huge Size Due to Sudden Retention Change Bug 13950866 Disk usage is 100% due to ora.crf ressource
第二个问题
ora.crf 是online状态,但是get reppath报错
$ oclumon manage -get reppath CRS-9011-Error manage: Failed to initialize connection to the Cluster Logger Service But "status & target" of resource ora.crf is online on all the node : crsctl stat res ora.crf -init NAME=ora.crf TYPE=ora.crf.type TARGET=ONLINE STATE=ONLINE on dibarac01
BUG 17238613 – LNX64-11204-CHM:OLOGGERD WAS DISABLED BECAUSE BDB GROWN BEYOND DESIRED LIMITS
BUG 20439706 – DB_KEYEXIST: KEY/DATA PAIR ALREADY EXISTS ERROR IN CRFLOGD.LOG
BUG 18447164 – CRFCLUST.BDB GROW HUGE SIZE
BUG 19692024 – EXADATA: CRFCLUST.BDB IS GROWING TO 40 GB
BUG 20127477 – CRFCLUST.BDB HAS GROWN UNEXPECTEDLY
BUG 20127477 – CRFCLUST.BDB HAS GROWN UNEXPECTEDLY
BUG 20316849 – HUGE REPSIZE RESULTING IN GI HOME DIRECTORY FILLING UP
BUG 20351845 – RETENTION FOR CHM DATA IS SET TO 34YRS.All of those are closed as duplicate of the following:
BUG 20186278 – TAG OCR: GET ID FAILED AND CHM DB SIZE 24 GB
第三个问题
12.1 引入的diagsnap 也是有CHM管理, 在osysmond执行pstack时存在问题,同样会导致实例重启或节点驱逐。
解决方法:
1, Disable osysmond from issuing pstack: As root user, issue crsctl stop res ora.crf -init Update PSTACK=DISABLE in $GRID_HOME/crf/admin/crf.ora crsctl start res ora.crf -init 2. disable diagsnap. As GI user, issue $GI_HOME/bin/oclumon manage -disable diagsnap
第四个问题
ora.crf 启动失败’Secondary index corrupt: not consistent with primary’ in log//crflogd/crflogd.log.
解决方法
手动重建 BDB databases , 方法和问题一处理一样, 不如需要注意一个单词叫“Secondary index” 第二个索引或叫次要索引, 索引什么时候有了顺序? 这点有点意思,因为在Oracle 19c中这个Secondary index还有更大的用途。 下一篇我们说说19c 中的Secondary index.
分析CHM 的shell(来自互联网)
#!/bin/bash # Description: # Convert CHM files to more human readable format like vmstat, .... # - move the MEM Low and CPU high message to the end of the line # - diplay data in a tabular format # # Usage : ./print_sys.sh grac41_CHMOS # grac41_CHMOS = oclumon output from : tfactl diagcollect # # Run a report for System Metrics from 16.01.00 - 16.01.59 # % ~/print_sys.sh grac41_CHMOS | egrep '#pcpus|cpuq:|03-22-14 10.00' # Output # pcpus: 2 #vcpus: 2 cpuht: N chipname: Intel(R) swaptotal: 5210108 physmemtotal: 4354292 #sysfdlimit: 6815744 #disks: 27 #nics: 6 # cpu: cpuq: memfree: mcache: swapfree: ior: iow: ios: swpin: swpout: pgin: pgout: netr: netw: procs: rtprocs: #fds: nicErrors: # 03-22-14 10.00.03 2.60 6 86356 215692 1811240 16 1 11 6 0 17 1 41 7 378 15 19648 0 # 03-22-14 10.00.13 5.27 1 89492 224720 1785120 8444 8528 166 2764 3414 4437 3497 41 12 381 15 19680 0 # 03-22-14 10.00.18 5.87 1 96180 227256 1776196 7682 5508 534 2004 2400 3762 2524 47 10 388 15 19712 0 # .. # # ... echo "-> File searched: " $1 # echo "-> Search Str 1 : " $2 # pcpus indicates a SYSTEM Metric report search1="pcpus" # # remove any ; from each line - simplifies processing cat $1 | sed 's/;/ /g' | sed 's/'\''//g' | awk 'BEGIN { cnt=0; } /Node:/ { Node=$0; Nodet1=$4; Nodet2=$5; } /'$search1'/ { # printf("%s \n", $1 ); if ( $1=="#pcpus:" ) { if ( cnt==0 ) { cnt++; # print header: number of CPUs and Chip Identidy printf ("%s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s %s \n", \ $1, $2, $3, $4, $5, $6, $7, $8, $21, $22, $15, $16, $47, $48, $49, $50, $51, $52); printf (" cpu: cpuq: memfree: mcache: swapfree: ior: iow: ios: swpin:"); printf (" swpout: pgin: pgout: netr: netw: procs: rtprocs: #fds: nicErrors: \n" ); } cnt++; for (i = 1; i <= NF; i++) { if ($i=="cpu:" ) { # printf ("%s ", $(i+1) ); memlow = ""; cpu=$(i+1); i++; } else if ($i=="cpuq:" ) { # printf ("%s ", $(i+1) ); cpuq=$(i+1); i++; } else if ($i=="physmemfree:" ) { physmemfree=$(i+1); i++; } else if ($i=="mcache:" ) { mcache=$(i+1); i++; } else if ($i=="swapfree:" ) { swapfree=$(i+1); i++; } else if ($i=="ior:" ) { ior=$(i+1); i++; } else if ($i=="iow:" ) { iow=$(i+1); i++; } else if ($i=="ios:" ) { ios=$(i+1); i++; } else if ($i=="swpin:" ) { swpin=$(i+1); i++; } else if ($i=="swpout:" ) { swpout=$(i+1); i++; } else if ($i=="pgin:" ) { pgin=$(i+1); i++; } else if ($i=="pgout:" ) { pgout=$(i+1); i++; } else if ($i=="netr:" ) { netr=$(i+1); i++; } else if ($i=="netw:" ) { netw=$(i+1); i++; } else if ($i=="procs:" ) { procs=$(i+1); i++; } else if ($i=="rtprocs:" ) { rtprocs=$(i+1); i++; } else if ($i=="#fds:" ) { fds=$(i+1); i++; } else if ($i=="nicErrors:" ) { nicErrors=$(i+1); i++; } else if ($i== "total-mem") { # Record detection for LOW memory indication # Available memory (physmemfree 91516 KB + swapfree 185276 KB) on node grac41 is Too Low (< 10% of total-mem + total-swap) # Search for total-mem and select i-2 field which is 10% is the above case # memlow = $(i-2); # printf(" **** MEM low: < %s *** " , $(i-2) ); } } printf ("%s %s %6s %3d %9s %9s %9s %5s %5s %5s %5s %5s %5s %5s %5d %5d %5d %5d %5d %5d ", \ Nodet1, Nodet2, cpu, cpuq, physmemfree, mcache, swapfree, ior, iow, ios, swpin, swpout, pgin,pgout, netr, netw, procs, rtprocs, fds, nicErrors ); if ( cpu > 90 ) { # Record detection for HIGH CPU usage indication printf (" CPU > 90% "); } if ( memlow != "" ) { printf(" MEMLOW < %s", memlow); } printf("\n"); # printf("%s \n", $1 ); # printf("%s %s %6s %3s %10s %10s %10s %5s %5s %5s %5s %5s %5s %5s %8s %8s %5s %5s %5s %5s \n", \ # Nodet1, Nodet2, $10, $12, $14, $18, $20, $24,$26,$28, $30, $32, $34 , $36, $38, $40, $42, $44, $46, $54 ); } } '
对不起,这篇文章暂时关闭评论。