Troubleshooting Oracle 19c RAC CSSD process cannot get real-timepriority
When CSSD process is unable to get real-time priority and is not running in real-time, it may lead to various HA issues. From 19c, this is treated as a fatal error.CSS cannot start normally if failed to get real-time priority.
GI alert log
2022-08-26 12:23:20.316 [OCSSD(5740)]CRS-1713: CSSD daemon is started in hub mode.
2022-08-26 12:23:20.418 [OCSSD(5740)]CRS-1726: Process failed to run in real-time priority.
2022-08-26 12:23:20.419 [OCSSD(5740)]CRS-1656: The CSS daemon is terminating due to a fatal error.
2022-08-26T12:23:21.430124+09:00
Errors in file /opt/app/grid/diag/crs//crs/trace/ocssd.trc (incident=1):
CRS-8503 [] [] [] [] [] [] [] [] [] [] [] []
Incident details in: /opt/app/grid/diag/crs//crs/incident/incdir_1/ocssd_i1.trc
2022-08-26 12:23:21.420 [OCSSD(5740)]CRS-8503: Oracle Clusterware process OCSSD with operating system process ID 5740 experienced fatal signal or exception code 6.
CSSD LOG
2022-08-26 12:23:20.418 : CSSD:2872900864: [ INFO] clssscInitGlobalCTX: Environment is production
2022-08-26 12:23:20.418 : CSSD:2872900864: [ INFO] (:CLSN00143:)clssscInitGlobalCTX: CSSD process cannot get real-timepriority
2022-08-26 12:23:20.418 : CSSD:2872900864: [ INFO] clsssc_logose: slos [-2], SLOS depend-msg [No such file or directory], SLOS error-msg [2]
2022-08-26 12:23:20.418 : CSSD:2872900864: [ INFO] clsssc_logose: SLOS other info is [process is not running in real-time. rc = 0].
2022-08-26 12:23:20.418 : CSSD:2872900864: [ INFO] (:CLSN00143:)clssscInitGlobalCTX: set priority system call had failed
2022-08-26 12:23:20.418 : CSSD:2872900864: [ INFO] (:CLSN00143:)clssscInitGlobalCTX: set priority system call had failed calling
Possible solutions
CASE 1
[root@oel7db1 ~]# sysctl -a 2>/dev/null |grep runtime kernel.sched_rt_runtime_us = 950000 [root@oel7db1 ~]#
Add a line below into the bottom of the file /etc/sysctl.conf
kernel.sched_rt_runtime_us=-1
CASE 2
The CPU accounting controller is used to group tasks using cgroups and account the CPU usage of these groups of tasks.
Starting with Grid Infrastructure 19c, the cssd process must start with real-time scheduling priority. Setting real-time scheduling priority can fail on systems where systemd CPU accounting is enabled.
first check cgourp is used.
To determine if the system has CPU accounting enabled
— run as ROOT user
# ls /sys/fs/cgroup/cpu,cpuacct | grep slice
If the above command returns any output, then the server has CPU accounting enabled and is vulnerable to this issue. Notice Some operating system security hardening software enables CPU accounting.
CASE 3.
The following workaround can be used until the fix for the bug 33610957 is applied.
Set Delegate=yes in the [service] section of the /etc/systemd/system/oracle-ohasd.service file and reload the change to oracle-ohasd.service service by issuing “systemctl daemon-reload”. If reload fails, then reboot the node.
before 19c
Clusterware process like OCSSD may report error, if the Docker Engine RPM is installed:
# rpm -qa|grep docker
References
Doc ID 2714854.1
对不起,这篇文章暂时关闭评论。