ORA-00600: internal error code, arguments: [Skgmfail] 案例
网友遇到的生产库问题,问我当时在家,没有记录详细的过程,现在大概说一下解决方法
数据库在运行一段时间后crash,遭遇ORA-00600: internal error code, arguments: [SKGMFAIL], [2], [4], [4], [1],重启后又可以正常运行。
环境RHEL 6.4 ORACLE RDBMS 11G R2
ps -ef|grep smon
查看数据库实例进程已停止,该错误的原因:
* symptom: Starting database fails * symptom: ORA-00600: internal error code, arguments: [Skgmfail],[2], [1] * cause: Unix kernel parameter max shared memory set too low fix: Check and set Unix kernel parmeters
当时让她检查了OS内核参数使用下面的命令,发现文件内容只有net相关的几个参数。
sh# cat /etc/sysctl.conf |grep -v ^#|sed /^[[:space:]]*$/d | sed /^$/d
可见安装时并未严格按照官方文档操作,安装的数据库再快也不是评价高手的标准,每个版本安装最好参照官方文档避免不必要的麻烦。
Requirements for Installing Oracle 11gR2 RDBMS on RHEL6 or OL6 64-bit (x86-64) (文档 ID 1441282.1)
1. Modify your kernel settings in /etc/sysctl.conf (RedHat) as follows. If the current value for any parameter is higher than the value listed in this table, do not change the value of that parameter. Range values (such as net.ipv4.ip_local_port_range) must match exactly. kernel.shmall = physical RAM size / pagesize For most systems, this will be the value 2097152. See Note 301830.1 for more information. kernel.shmmax = 1/2 of physical RAM. This would be the value 2147483648 for a system with 4GB of physical RAM. See Note:567506.1 for more information. kernel.shmmni = 4096 kernel.sem = 250 32000 100 128 fs.file-max = 512 x processes (for example 6815744 for 13312 processes) fs.aio-max-nr = 1048576 net.ipv4.ip_local_port_range = 9000 65500 net.core.rmem_default = 262144 net.core.rmem_max = 4194304 net.core.wmem_default = 262144 net.core.wmem_max = 1048576 2. To activate these new settings into the running kernel space, run the "sysctl -p" command as root. 3. Set Shell Limits for the oracle User. Assuming that the "oracle" Unix user will perform the installation, do the following: a.) Add the following settings to /etc/security/limits.conf oracle soft nproc 2047 oracle hard nproc 16384 oracle soft nofile 1024 oracle hard nofile 65536 oracle soft stack 10240 b.) Verify the latest version of PAM is loaded, then add or edit the following line in the /etc/pam.d/login file, if it does not already exist: session required pam_limits.so c.) Verify the current ulimits, and raise if needed. This can be done many ways...adding the following lines to /etc/profile is the recommended method: if [ $USER = "oracle" ]; then if [ $SHELL = "/bin/ksh" ]; then ulimit -u 16384 ulimit -n 65536 else ulimit -u 16384 -n 65536 fi fi
查看了当前的内存情况
free -m ipcs -m
tip:
发现是主机64G 物理内存,使用了Posix-Style shared memory management 的内存管理方式,数据库使用了AMM,TARGET 25G,实际内存使用了8G 左右,这点从以下命令可以确认内核参数限制,此案例值确实是非常的小。而且数据库的异常crash,共享内存段没有释放,reboot OS后自动释放。
# sysctl -a|grep shm
tip:
最后的解决方法是,重新配置了sysctl.conf内核参数,启用了Hugepage,禁用了AMM,期间也有遇到ORA-00371,ORA-27102问题,但是那都不是事儿,创建pfile调整参数重建spfile.
Summary:
数据库的安装配置决不只是解压安装。
对不起,这篇文章暂时关闭评论。