首页 » Cloud, ORACLE 9i-23ai, 系统相关 » LMSn not running in RT (real time) mode Oracle 19c RAC?

LMSn not running in RT (real time) mode Oracle 19c RAC?

Oracle 希望在数据库主机CPU使用率枯竭时,尽可能让核心的几个后台进程可以最大优先级获取CPU, 当然CPU过高会导致I/O 响应时间变长和网络延迟增加,也会间接影响数据的整体性能。从oracle 10g开始是有隐藏参数_high_priority_processes控制哪些进程是高优先级,在19c参数除了_high_priority_processes还增加_highest_priority_processes控制VMTK最高优先级。在 10.2 版本中,Oracle缺省_high_priority_processes的对RAC的核心进程LMS* 设置高优先级,在11g版本对 LMS*|VKTM 设置高优先级,在19C版本对VKTM是最高优先级,_high_priority_processes提供了更多对LMS*|LM*|LCK0|GCR*|CKPT|DBRM|RMS0|LGWR|CR*|RMV* 配置高优先级 。记的在10.2.0.3前好像有个bug 会导致进程过高使用CPU。 最近有客户遇到CPU 使用率超过90%时GC问题较为突出,在查看LMS时发现没有在RT模式引起了注意,在19c中 LMS还是有一些变化,下面简单的记录。

在Linux平台上,进程的内核调用模式分为三类:

TS – SCHED_OTHER (SCHED_NORMAL) ,这是分时调度策略,缺省的通用级别;
FF – SCHED_FIFO,这是实时调度策略,先进先出;
RR – SCHED_RR,实时调度策略,时间片轮转;

先看一个正常环境 oracle 19c RAC 2-nodes on RHEL 7.8

# db alert log
Starting background process CLMN
CLMN started with pid=3, OS id=28714
Starting background process PSP0
PSP0 started with pid=4, OS id=28731
Starting background process IPC0
2021-03-23 10:07:32.440000 +08:00
IPC0 started with pid=5, OS id=29420
Starting background process VKTM
Starting background process GEN0
VKTM started with pid=6, OS id=29445 at elevated (RT) priority
VKTM running at (1)millisec precision with DBRM quantum (100)ms
Starting background process MMAN

Starting background process LMD1
LMD0 started with pid=23, OS id=29631
* Load Monitor used for high load check
* New Low - High Load Threshold Range = [130560 - 174080]
LMS1 started with pid=26, OS id=29640_29663 at elevated (RT) priority
LMS0 started with pid=24, OS id=29635_29662 at elevated (RT) priority
LMS2 started with pid=28, OS id=29646_29666 at elevated (RT) priority
Starting background process LMD2
LMD1 started with pid=36, OS id=29659
LMS3 started with pid=30, OS id=29649_29667 at elevated (RT) priority
LMS4 started with pid=32, OS id=29651_29672 at elevated (RT) priority
LMS5 started with pid=34, OS id=29653_29677 at elevated (RT) priority
Starting background process LMD3
LMD2 started with pid=37, OS id=29681
LMD3 started with pid=38, OS id=29686
Starting background process RMS0
RMS0 started with pid=39, OS id=29689


oracle@anbob_com:/home/oracle>  ps -efc|grep vktm
grid      34874      1 RR   41 Jun03 ?        00:06:20 asm_vktm_+ASM1
oracle    42358      1 RR   41 Jun03 ?        00:05:24 ora_vktm_anbob1
grid      58462      1 RR   41 Jun03 ?        00:06:18 mdb_vktm_-MGMTDB

Note:  
 使用ps -c 选项查看进程优先级时, vktm 是RR mode.
oracle@anbob_com:/home/oracle> ps -efc|grep lms
oracle    35148  90946 TS   19 16:02 pts/3    00:00:00 grep --color=auto lms
oracle    66573      1 TS   19 May21 ?        04:32:32 ora_lms0_anbob1
oracle    66576      1 TS   19 May21 ?        04:29:41 ora_lms1_anbob1
oracle    66578      1 TS   19 May21 ?        04:26:33 ora_lms2_anbob1
oracle    66581      1 TS   19 May21 ?        04:26:51 ora_lms3_anbob1
oracle    66586      1 TS   19 May21 ?        04:25:38 ora_lms4_anbob1
oracle    66589      1 TS   19 May21 ?        04:28:44 ora_lms5_anbob1
oracle    66596      1 TS   19 May21 ?        04:25:44 ora_lms6_anbob1
oracle    66599      1 TS   19 May21 ?        04:50:02 ora_lms7_anbob1
oracle    66603      1 TS   19 May21 ?        04:22:42 ora_lms8_anbob1
oracle    66609      1 TS   19 May21 ?        04:21:31 ora_lms9_anbob1
oracle    66615      1 TS   19 May21 ?        04:25:41 ora_lmsa_anbob1
oracle    66620      1 TS   19 May21 ?        04:29:43 ora_lmsb_anbob1
grid     129022      1 TS   19 May14 ?        00:36:49 asm_lms0_+ASM1

Note:
使用ps -c 选项查看进程优先级时,但lms 还是TS Mode. 在12C 版本及之前也PS是显示RR mode,如下

# sqlplus -V
SQL*Plus: Release 12.2.0.1.0 Production

# ps -eLfc |head -n 1;ps -eLfc|grep lms
UID        PID  PPID   LWP NLWP CLS PRI STIME TTY          TIME CMD
grid     14661     1 14661    1 RR   41  2019 ?        1-08:14:40 asm_lms0_+ASM1
oracle   62106     1 62106    1 RR   41  2019 ?        17-22:45:22 ora_lms0_weejar1
oracle   62109     1 62109    1 RR   41  2019 ?        18-10:30:26 ora_lms1_weejar1
oracle   62111     1 62111    1 RR   41  2019 ?        18-00:13:16 ora_lms2_weejar1
oracle   62113     1 62113    1 RR   41  2019 ?        17-22:02:20 ora_lms3_weejar1
oracle   62115     1 62115    1 RR   41  2019 ?        17-22:07:53 ora_lms4_weejar1

# 检查oradism文件

oracle@anbob_com:/home/oracle> ls -l $ORACLE_HOME/bin/oradism
-rwsr-x--- 1 root oinstall 147848 Apr 17 2019 /oracle/app/oracle/product/19c/db_1/bin/oradism

正常

Note:
For 10gR2 and 11gR1 installations, verify that the oradism executable matches the following ownership and permissions “-rwsr-sr-x 1 root dba oradism” and make sure the lms is running in Real Time mode.

# 检查ORACLE_HOME文件系统挂载点

oracle@anbob_com:/home/oracle> cat /proc/mounts|grep oracle
/dev/mapper/fusioncube-oracle /oracle ext4 rw,relatime,stripe=16,data=ordered 0 0

正常

# AWR中LMS

RAC Statistics

Begin End
Number of Instances: 2 2
Number of LMS’s: 12 12
Number of realtime LMS’s: 12 12 (0 priority changes)

# 检查后台进程

SQL> select 'LMS', INST_ID,PRIORITY,COUNT(*) TOTAL FROM GV$BGPROCESS where name like 'LMS%' GROUP BY INST_ID,PRIORITY ;

'LMS'     INST_ID PRIORITY              TOTAL
------ ---------- ---------------- ----------
LMS             1 RT                       12
LMS             2 RT                       12

种种显示当前LMS进程是RT模式,但PS显示进程还是TS,难道是显示问题?还是ORACLE有新特性改变? 是的, 从18c开始LMS进程改为线程模式。

oracle@anbob_com:/home/oracle> ps -eLfc |head -n 1;ps -eLfc|grep lms
UID         PID   PPID    LWP NLWP CLS PRI STIME TTY          TIME CMD
oracle    66573      1  66573    4 TS   19 May21 ?        00:00:08 ora_lms0_anbob1
oracle    66573      1  66580    4 RR   41 May21 ?        03:15:29 ora_lms0_anbob1
oracle    66573      1  67219    4 TS   19 May21 ?        00:23:08 ora_lms0_anbob1
oracle    66573      1  67240    4 TS   19 May21 ?        00:53:41 ora_lms0_anbob1
oracle    66576      1  66576    4 TS   19 May21 ?        00:00:08 ora_lms1_anbob1
oracle    66576      1  66582    4 RR   41 May21 ?        03:12:36 ora_lms1_anbob1
oracle    66576      1  67270    4 TS   19 May21 ?        00:23:09 ora_lms1_anbob1
oracle    66576      1  67301    4 TS   19 May21 ?        00:53:43 ora_lms1_anbob1
oracle    66578      1  66578    4 TS   19 May21 ?        00:00:08 ora_lms2_anbob1
oracle    66578      1  66591    4 RR   41 May21 ?        03:10:10 ora_lms2_anbob1
oracle    66578      1  67339    4 TS   19 May21 ?        00:22:52 ora_lms2_anbob1
...

ok.

再看另一个问题环境Oracle 19.4 2-nodes RAC on RHEL 7.5

RAC Statistics

Begin End
Number of Instances: 2 2
Number of LMS’s: 40 40
Number of realtime LMS’s: 0 0 (0 priority changes)
SQL> select * from v$bgprocess where name like 'LMS%';
PADDR              PSERIAL# NAME  DESCRIPTION                      PRIORITY     CON_ID
---------------- ---------- ----- -------------------------------- -------- ----------
0000001E01B628A0          1 LMS0  global cache service process     TS                0
0000001E01B65360          1 LMS7  global cache service process     TS                0
0000001E01B67E20          1 LMSE  global cache service process     TS                0
0000001E01B6A8E0          1 LMSL  global cache service process     TS                0
0000001E01B6D3A0          1 LMSS  global cache service process     TS                0
0000001E01B6FE60          1 LMSZ  global cache service process     TS                0
0000001E21AC8498          1 LMS3  global cache service process     TS                0
0000001E21ACAF58          1 LMSA  global cache service process     TS                0
0000001E21ACDA18          1 LMSH  global cache service process     TS                0
0000001E21AD04D8          1 LMSO  global cache service process     TS                0
0000001E21AD2F98          1 LMSV  global cache service process     TS                0
0000001E41A66B58          1 LMS6  global cache service process     TS                0
...

# db alert log

2021-06-03T10:50:19.500768+08:00
LMON started with pid=22, OS id=98747
Starting background process LMD0
2021-06-03T10:50:19.527437+08:00
LMD0 started with pid=23, OS id=98749
Starting background process LMD1
2021-06-03T10:50:19.528918+08:00
* Load Monitor used for high load check
* New Low - High Load Threshold Range = [230400 - 307200]
2021-06-03T10:50:19.703222+08:00
Errors in file /u01/oracle/diag/rdbms/anbob1/anbob11/trace/anbob11_lms0_98751_98758.trc  (incident=873064):
ORA-00800: soft external error, arguments: [Set Priority Failed], [LMS0], [Check traces and OS configuration], [Check Oracle document and MOS notes], []
Incident details in: /u01/oracle/diag/rdbms/anbob1/anbob11/incident/incdir_873064/anbob11_lms0_98751_98758_i873064.trc
2021-06-03T10:50:19.711460+08:00
Error attempting to elevate LMS0's priority: no further priority changes will be attempted for this process
LMS0 started with pid=24, OS id=98751_98758
2021-06-03T10:50:19.800751+08:00
Errors in file /u01/oracle/diag/rdbms/anbob1/anbob11/trace/anbob11_lmsd_98808_98825.trc  (incident=873065):
ORA-00800: soft external error, arguments: [Set Priority Failed], [LMSD], [Check traces and OS configuration], [Check Oracle document and MOS notes], []
2021-06-03T10:50:19.815049+08:00
Error attempting to elevate LMSD's priority: no further priority changes will be attempted for this process
LMSD started with pid=50, OS id=98808_98825
2021-06-03T10:50:19.924836+08:00
LMD1 started with pid=104, OS id=98950
2021-06-03T10:50:19.924929+08:00
Starting background process LMD2
2021-06-03T10:50:19.944617+08:00
Errors in file /u01/oracle/diag/rdbms/anbob1/anbob11/trace/anbob11_lmsb_98797_98815.trc  (incident=873066):
ORA-00800: soft external error, arguments: [Set Priority Failed], [LMSB], [Check traces and OS configuration], [Check Oracle document and MOS notes], []
2021-06-03T10:50:19.945838+08:00
Error attempting to elevate LMSB's priority: no further priority changes will be attempted for this process
Starting background process LMD3
2021-06-03T10:50:19.949748+08:00

Note:
这套环境的LMS进程运行在TS 模式,是因为在实例启动时就遇到了ORA-800错误[Set Priority Failed]失败了。

#检查oradism

oracle@anbob1a:/home/oracle/scripts_oracle$ ls -l $ORACLE_HOME/bin/oradism
-rwxr-x--- 1 oracle oinstall 147848 Apr 17 2019 /u01/oracle/product/bin/oradism

对于这个环境的owner 和权限都是错的, 修正后重启实例就可以解决。

也可以使用root用户使用chrt在线尝试修改进程级为RR mode

# chrt -r -p 1 [lms pid]

— update 2022-11-22 —

发现ORACLE 19C有两个已知BUG

There are two issues covered under this Exadata critical issue alert.

Issue #1 – Due to bug 33610957 and bug 34534868, during operating system startup Oracle Clusterware may not start because the CSS daemon (OCSSD process ocssd.bin) cannot be set to run with real-time scheduler priority. Failed clusterware startup can cause the following:

Extended service outage during a planned maintenance event
Delayed recovery from an unplanned outage
Interruption of rolling maintenance orchestrations (e.g. interruption of Exadata Cloud Service infrastructure update)

Issue #2 – A related issue, bug 34286265 and bug 34318125, may occur where set process priority fails on critical database background processes, such as VKTM and LMS, which can result in performance degradation. This issue occurs only in environments where database instances are started using the SQL*Plus utility.

打赏

,

对不起,这篇文章暂时关闭评论。