LMSn not running in RT (real time) mode Oracle 19c RAC?
Oracle 希望在数据库主机CPU使用率枯竭时,尽可能让核心的几个后台进程可以最大优先级获取CPU, 当然CPU过高会导致I/O 响应时间变长和网络延迟增加,也会间接影响数据的整体性能。从oracle 10g开始是有隐藏参数_high_priority_processes控制哪些进程是高优先级,在19c参数除了_high_priority_processes还增加_highest_priority_processes控制VMTK最高优先级。在 10.2 版本中,Oracle缺省_high_priority_processes的对RAC的核心进程LMS* 设置高优先级,在11g版本对 LMS*|VKTM 设置高优先级,在19C版本对VKTM是最高优先级,_high_priority_processes提供了更多对LMS*|LM*|LCK0|GCR*|CKPT|DBRM|RMS0|LGWR|CR*|RMV* 配置高优先级 。记的在10.2.0.3前好像有个bug 会导致进程过高使用CPU。 最近有客户遇到CPU 使用率超过90%时GC问题较为突出,在查看LMS时发现没有在RT模式引起了注意,在19c中 LMS还是有一些变化,下面简单的记录。
在Linux平台上,进程的内核调用模式分为三类:
TS – SCHED_OTHER (SCHED_NORMAL) ,这是分时调度策略,缺省的通用级别;
FF – SCHED_FIFO,这是实时调度策略,先进先出;
RR – SCHED_RR,实时调度策略,时间片轮转;
先看一个正常环境 oracle 19c RAC 2-nodes on RHEL 7.8
# db alert log Starting background process CLMN CLMN started with pid=3, OS id=28714 Starting background process PSP0 PSP0 started with pid=4, OS id=28731 Starting background process IPC0 2021-03-23 10:07:32.440000 +08:00 IPC0 started with pid=5, OS id=29420 Starting background process VKTM Starting background process GEN0 VKTM started with pid=6, OS id=29445 at elevated (RT) priority VKTM running at (1)millisec precision with DBRM quantum (100)ms Starting background process MMAN Starting background process LMD1 LMD0 started with pid=23, OS id=29631 * Load Monitor used for high load check * New Low - High Load Threshold Range = [130560 - 174080] LMS1 started with pid=26, OS id=29640_29663 at elevated (RT) priority LMS0 started with pid=24, OS id=29635_29662 at elevated (RT) priority LMS2 started with pid=28, OS id=29646_29666 at elevated (RT) priority Starting background process LMD2 LMD1 started with pid=36, OS id=29659 LMS3 started with pid=30, OS id=29649_29667 at elevated (RT) priority LMS4 started with pid=32, OS id=29651_29672 at elevated (RT) priority LMS5 started with pid=34, OS id=29653_29677 at elevated (RT) priority Starting background process LMD3 LMD2 started with pid=37, OS id=29681 LMD3 started with pid=38, OS id=29686 Starting background process RMS0 RMS0 started with pid=39, OS id=29689 oracle@anbob_com:/home/oracle> ps -efc|grep vktm grid 34874 1 RR 41 Jun03 ? 00:06:20 asm_vktm_+ASM1 oracle 42358 1 RR 41 Jun03 ? 00:05:24 ora_vktm_anbob1 grid 58462 1 RR 41 Jun03 ? 00:06:18 mdb_vktm_-MGMTDB Note: 使用ps -c 选项查看进程优先级时, vktm 是RR mode. oracle@anbob_com:/home/oracle> ps -efc|grep lms oracle 35148 90946 TS 19 16:02 pts/3 00:00:00 grep --color=auto lms oracle 66573 1 TS 19 May21 ? 04:32:32 ora_lms0_anbob1 oracle 66576 1 TS 19 May21 ? 04:29:41 ora_lms1_anbob1 oracle 66578 1 TS 19 May21 ? 04:26:33 ora_lms2_anbob1 oracle 66581 1 TS 19 May21 ? 04:26:51 ora_lms3_anbob1 oracle 66586 1 TS 19 May21 ? 04:25:38 ora_lms4_anbob1 oracle 66589 1 TS 19 May21 ? 04:28:44 ora_lms5_anbob1 oracle 66596 1 TS 19 May21 ? 04:25:44 ora_lms6_anbob1 oracle 66599 1 TS 19 May21 ? 04:50:02 ora_lms7_anbob1 oracle 66603 1 TS 19 May21 ? 04:22:42 ora_lms8_anbob1 oracle 66609 1 TS 19 May21 ? 04:21:31 ora_lms9_anbob1 oracle 66615 1 TS 19 May21 ? 04:25:41 ora_lmsa_anbob1 oracle 66620 1 TS 19 May21 ? 04:29:43 ora_lmsb_anbob1 grid 129022 1 TS 19 May14 ? 00:36:49 asm_lms0_+ASM1
Note:
使用ps -c 选项查看进程优先级时,但lms 还是TS Mode. 在12C 版本及之前也PS是显示RR mode,如下
# sqlplus -V SQL*Plus: Release 12.2.0.1.0 Production # ps -eLfc |head -n 1;ps -eLfc|grep lms UID PID PPID LWP NLWP CLS PRI STIME TTY TIME CMD grid 14661 1 14661 1 RR 41 2019 ? 1-08:14:40 asm_lms0_+ASM1 oracle 62106 1 62106 1 RR 41 2019 ? 17-22:45:22 ora_lms0_weejar1 oracle 62109 1 62109 1 RR 41 2019 ? 18-10:30:26 ora_lms1_weejar1 oracle 62111 1 62111 1 RR 41 2019 ? 18-00:13:16 ora_lms2_weejar1 oracle 62113 1 62113 1 RR 41 2019 ? 17-22:02:20 ora_lms3_weejar1 oracle 62115 1 62115 1 RR 41 2019 ? 17-22:07:53 ora_lms4_weejar1
# 检查oradism文件
oracle@anbob_com:/home/oracle> ls -l $ORACLE_HOME/bin/oradism
-rwsr-x--- 1 root oinstall 147848 Apr 17 2019 /oracle/app/oracle/product/19c/db_1/bin/oradism
正常
Note:
For 10gR2 and 11gR1 installations, verify that the oradism executable matches the following ownership and permissions “-rwsr-sr-x 1 root dba oradism” and make sure the lms is running in Real Time mode.
# 检查ORACLE_HOME文件系统挂载点
oracle@anbob_com:/home/oracle> cat /proc/mounts|grep oracle /dev/mapper/fusioncube-oracle /oracle ext4 rw,relatime,stripe=16,data=ordered 0 0 正常
# AWR中LMS
RAC Statistics
Begin | End | ||
---|---|---|---|
Number of Instances: | 2 | 2 | |
Number of LMS’s: | 12 | 12 | |
Number of realtime LMS’s: | 12 | 12 | (0 priority changes) |
# 检查后台进程
SQL> select 'LMS', INST_ID,PRIORITY,COUNT(*) TOTAL FROM GV$BGPROCESS where name like 'LMS%' GROUP BY INST_ID,PRIORITY ; 'LMS' INST_ID PRIORITY TOTAL ------ ---------- ---------------- ---------- LMS 1 RT 12 LMS 2 RT 12
种种显示当前LMS进程是RT模式,但PS显示进程还是TS,难道是显示问题?还是ORACLE有新特性改变? 是的, 从18c开始LMS进程改为线程模式。
oracle@anbob_com:/home/oracle> ps -eLfc |head -n 1;ps -eLfc|grep lms UID PID PPID LWP NLWP CLS PRI STIME TTY TIME CMD oracle 66573 1 66573 4 TS 19 May21 ? 00:00:08 ora_lms0_anbob1 oracle 66573 1 66580 4 RR 41 May21 ? 03:15:29 ora_lms0_anbob1 oracle 66573 1 67219 4 TS 19 May21 ? 00:23:08 ora_lms0_anbob1 oracle 66573 1 67240 4 TS 19 May21 ? 00:53:41 ora_lms0_anbob1 oracle 66576 1 66576 4 TS 19 May21 ? 00:00:08 ora_lms1_anbob1 oracle 66576 1 66582 4 RR 41 May21 ? 03:12:36 ora_lms1_anbob1 oracle 66576 1 67270 4 TS 19 May21 ? 00:23:09 ora_lms1_anbob1 oracle 66576 1 67301 4 TS 19 May21 ? 00:53:43 ora_lms1_anbob1 oracle 66578 1 66578 4 TS 19 May21 ? 00:00:08 ora_lms2_anbob1 oracle 66578 1 66591 4 RR 41 May21 ? 03:10:10 ora_lms2_anbob1 oracle 66578 1 67339 4 TS 19 May21 ? 00:22:52 ora_lms2_anbob1 ...
ok.
再看另一个问题环境Oracle 19.4 2-nodes RAC on RHEL 7.5
RAC Statistics
Begin | End | ||
---|---|---|---|
Number of Instances: | 2 | 2 | |
Number of LMS’s: | 40 | 40 | |
Number of realtime LMS’s: | 0 | 0 | (0 priority changes) |
SQL> select * from v$bgprocess where name like 'LMS%'; PADDR PSERIAL# NAME DESCRIPTION PRIORITY CON_ID ---------------- ---------- ----- -------------------------------- -------- ---------- 0000001E01B628A0 1 LMS0 global cache service process TS 0 0000001E01B65360 1 LMS7 global cache service process TS 0 0000001E01B67E20 1 LMSE global cache service process TS 0 0000001E01B6A8E0 1 LMSL global cache service process TS 0 0000001E01B6D3A0 1 LMSS global cache service process TS 0 0000001E01B6FE60 1 LMSZ global cache service process TS 0 0000001E21AC8498 1 LMS3 global cache service process TS 0 0000001E21ACAF58 1 LMSA global cache service process TS 0 0000001E21ACDA18 1 LMSH global cache service process TS 0 0000001E21AD04D8 1 LMSO global cache service process TS 0 0000001E21AD2F98 1 LMSV global cache service process TS 0 0000001E41A66B58 1 LMS6 global cache service process TS 0 ...
# db alert log
2021-06-03T10:50:19.500768+08:00 LMON started with pid=22, OS id=98747 Starting background process LMD0 2021-06-03T10:50:19.527437+08:00 LMD0 started with pid=23, OS id=98749 Starting background process LMD1 2021-06-03T10:50:19.528918+08:00 * Load Monitor used for high load check * New Low - High Load Threshold Range = [230400 - 307200] 2021-06-03T10:50:19.703222+08:00 Errors in file /u01/oracle/diag/rdbms/anbob1/anbob11/trace/anbob11_lms0_98751_98758.trc (incident=873064): ORA-00800: soft external error, arguments: [Set Priority Failed], [LMS0], [Check traces and OS configuration], [Check Oracle document and MOS notes], [] Incident details in: /u01/oracle/diag/rdbms/anbob1/anbob11/incident/incdir_873064/anbob11_lms0_98751_98758_i873064.trc 2021-06-03T10:50:19.711460+08:00 Error attempting to elevate LMS0's priority: no further priority changes will be attempted for this process LMS0 started with pid=24, OS id=98751_98758 2021-06-03T10:50:19.800751+08:00 Errors in file /u01/oracle/diag/rdbms/anbob1/anbob11/trace/anbob11_lmsd_98808_98825.trc (incident=873065): ORA-00800: soft external error, arguments: [Set Priority Failed], [LMSD], [Check traces and OS configuration], [Check Oracle document and MOS notes], [] 2021-06-03T10:50:19.815049+08:00 Error attempting to elevate LMSD's priority: no further priority changes will be attempted for this process LMSD started with pid=50, OS id=98808_98825 2021-06-03T10:50:19.924836+08:00 LMD1 started with pid=104, OS id=98950 2021-06-03T10:50:19.924929+08:00 Starting background process LMD2 2021-06-03T10:50:19.944617+08:00 Errors in file /u01/oracle/diag/rdbms/anbob1/anbob11/trace/anbob11_lmsb_98797_98815.trc (incident=873066): ORA-00800: soft external error, arguments: [Set Priority Failed], [LMSB], [Check traces and OS configuration], [Check Oracle document and MOS notes], [] 2021-06-03T10:50:19.945838+08:00 Error attempting to elevate LMSB's priority: no further priority changes will be attempted for this process Starting background process LMD3 2021-06-03T10:50:19.949748+08:00
Note:
这套环境的LMS进程运行在TS 模式,是因为在实例启动时就遇到了ORA-800错误[Set Priority Failed]失败了。
#检查oradism
oracle@anbob1a:/home/oracle/scripts_oracle$ ls -l $ORACLE_HOME/bin/oradism -rwxr-x--- 1 oracle oinstall 147848 Apr 17 2019 /u01/oracle/product/bin/oradism
对于这个环境的owner 和权限都是错的, 修正后重启实例就可以解决。
也可以使用root用户使用chrt在线尝试修改进程级为RR mode
# chrt -r -p 1 [lms pid]
— update 2022-11-22 —
发现ORACLE 19C有两个已知BUG
There are two issues covered under this Exadata critical issue alert.
Issue #1 – Due to bug 33610957 and bug 34534868, during operating system startup Oracle Clusterware may not start because the CSS daemon (OCSSD process ocssd.bin) cannot be set to run with real-time scheduler priority. Failed clusterware startup can cause the following:
Extended service outage during a planned maintenance event
Delayed recovery from an unplanned outage
Interruption of rolling maintenance orchestrations (e.g. interruption of Exadata Cloud Service infrastructure update)Issue #2 – A related issue, bug 34286265 and bug 34318125, may occur where set process priority fails on critical database background processes, such as VKTM and LMS, which can result in performance degradation. This issue occurs only in environments where database instances are started using the SQL*Plus utility.
对不起,这篇文章暂时关闭评论。