Table of Contents
- 1. 现象
- 2. 原因分析
- 3. MOS搬迁
- 3.1. Cause
- 3.2. Solution
- 3.2.1. ORADISM Permission:
- 3.2.2. Check the priority of VKTM or LMS* @RDBMS level:
- 3.2.3. Cgroup Configuration
- 3.2.4. Priority and Runtime
- 3.2.5. "nosuid" option :
- 3.2.6. Oradism and Oracle binary permission difference:
- 3.2.7. Apply this patch - Bug 34318125 - ORA-00800: SOFT EXTERNAL ERROR, ARGUMENTS: [SET PRIORITY FAILED] ON BG PROCESSES\\
- 3.2.8. Apply this merge patch - Bug 34286265 Bug 34318125 Bug 34672698
- 4. 疑问
1. 现象
手动启动oracle instance时,日志中出现很多[LMS*]进程报错,类似如下:
Errors in file /u01/app/oracle/diag/rdbms/boss/boss1/trace/boss1_vktm_93651.trc (incident=144083) (PDBNAME=CDB$ROOT): ORA-00800: soft external error, arguments: [Set Priority Failed], [VKTM], [Check traces and OS configuration], [Check Oracle document and MOS notes], [] ...... Errors in file /u01/app/oracle/diag/rdbms/boss/boss1/trace/boss1_lmst_93855_93869.trc (incident=240067): ORA-00800: soft external error, arguments: [Set Priority Failed], [LMST], [Check traces and OS configuration], [Check Oracle document and MOS notes], []
2. 原因分析
[oracle@xxxx trace]$ oerr ora 800 00800, 00000, "soft external error, arguments: [%s], [%s], [%s], [%s], [%s]" // *Cause: An improper system configuration or setting resulted in failure. // This failure is not fatal to the instance at the moment, however, this might result // in an unexpected behavior during query execution. // *Action: Check the database trace files and rectify system settings or the configuration. // For additional information, refer to Oracle database documentation or refer to // My Oracle Support (MOS) notes.
从Oracle官方提供的这个提示来看,原因是系统配置不正确或者配置未生效引起的。这个错误并不是致命的,但是在后期的查询操作中可能会胡异常。
由于信息不明确,也没有提示是哪个错误信息。只能查MOS了。
3. MOS搬迁
3.1. Cause
19c configured on OL7/UEK4. UEK4 requires rt_period and rt_runtime set for the cgroup from where the database is started. Typically the root one unless the customer is setting up their own cgroups or using system user slices. It requires the proper values such as the 95%. For this problem, ORA-800 will have the following message in the incident.
3.2. Solution
This Note is for ORA-800[VKTM] issues. If you are facing only warning messages like below, Then follow Note 1347586.1 for Solution Time drift detected. Please check VKTM trace file for more details.Warning: VKTM detected a time drift
3.2.1. ORADISM Permission:
Check the oradism permission in $ORACLE_HOME/bin. It should be owned by root and sticky bit set. Oradism should be owned by root:oinstall with permission of 4750.
$ cd $ORACLE_HOME/bin $ ls -lrt oradism -rwsr-x--- 1 root oinstall 147848 Apr 17 2019 oradism
You can change the permission as below:
chown root $ORACLE_HOME/bin/oradism chmod 4750 $ORACLE_HOME/bin/oradism
3.2.2. Check the priority of VKTM or LMS* @RDBMS level:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production Version 19.28.0.0.0 SQL> set lines 200 pages 20 SQL> col parameter for a30 col "Session Value" for a15 SQL> SQL> col "Instance Value" for a15 SQL> col "Description" for a40 SQL> select a.ksppinm "Parameter", b.ksppstvl "Session Value", c.ksppstvl "Instance Value", a.KSPPDESC "Description" from x$ksppi a, x$ksppcv b, x$ksppsv c where a.indx = b.indx and a.indx = c.indx and a.ksppinm like '_%' and a.ksppinm like '_highest_priority_process%';Parameter Session Value Instance Value Description ------------------------------ --------------- --------------- ---------------------------------------- _highest_priority_processes VKTM|CTWR VKTM|CTWR Highest Priority Process Name Mask
If not "VKTM" is not prioritized, please elevate the priority using below SQL
SQL> alter system set "_high_priority_processes"
'VKTM' scope=spfile;=
Restart the Database and verify VKTM is elevated or not.
3.2.3. Cgroup Configuration
After setting the above, still you have error for vktm then we need to verify the cgroup configuration.
Note: The above looks good or configured proper and it still fails. In which case, the instance might be running in some other cgroup.
You can find PID of VKTM of database which has problem :
The instance might be running in some other cgroup. Check the VKTMs cpu cgroups.
cat /proc/<vktm pid>/cgroup | grep cpu
If it shows some other path ( other than root such as 7:cpu,cpuacct:/ ), then make sure that the required cgroup is having the correct settings.
Try for VKTM :
-bash-4.1$ ps -eaf|grep -i vktm |grep -v grep oracle 15722 1 29 Sep05 ? 06:49:05 ora_vktm_orcl12102 oracle 18110 1 0 13:33 ? 00:00:01 ora_vktm_orcl12201
If it shows some other path ( other than root such as 7:cpu,cpuacct:/ ), then make sure that the required cgroup is having the correct settings
If customer has below settings, For the vtkm process
SID= > cat /proc/60698/cgroup | grep cpu
6:cpuset:/ 2:cpuacct,cpu:/user.slice
Kindly do the following settings
1. echo 0 > /sys/fs/cgroup/cpu,cpuacct/system.slice/cpu.rt_runtime_us 2. echo 950000 > /sys/fs/cgroup/cpu,cpuacct/user.slice/cpu.rt_runtime_us
Execute the below commands and then retry alter system set "_high_priority_processes"
'VKTM' scope=spfile;=
Restart the database and check ORA-800 resolved or not.
3.2.4. Priority and Runtime
Still ORA-800 occurs then you can use following commands to check priority and runtime[RR]
#ps -eLo 'pid tid cls pri cmd comm cgroup' | egrep 'TID|vktm' |grep -v grep [ from oracle user]
For example :
-bash-4.2$ ps -eLo 'pid tid cls pri cmd comm cgroup' | egrep 'TID|vktm' |grep -v grepPID TID CLS PRI CMD COMMAND CGROUP 8902 8902 TS 19 ora_vktm_db193cdb ora_vktm_db193c 6:devices:/system.slice/oracle-database.service,5:cpu,cpuacct:/system.slice,1:name=systemd:/system.slice/oracle-database.service 9332 9332 TS 19 ora_vktm_orcl19c ora_vktm_orcl19 6:devices:/system.slice/oracle-database.service,5:cpu,cpuacct:/system.slice,1:name=systemd:/system.slice/oracle-database.service
Here CLS and PRI is having incorrect value (TS and 19) but the correct value is RR and 41.
To the know the value please use
cgget system.slice or cgget user.slice [ based on customer environment cpu,cpuacct:/system.slic OR cpu,cpuacct:/user.slice]
Run the below command to set the correct value
A) If it is user.slice: echo 0 > /sys/fs/cgroup/cpu,cpuacct/system.slice/cpu.rt_runtime_us echo 950000 > /sys/fs/cgroup/cpu,cpuacct/user.slice/cpu.rt_runtime_usOrcgset -r cpu.rt_runtime_us=950000 user.slice >> root user command B) if it is system.slice : echo 0 > /sys/fs/cgroup/cpu,cpuacct/user.slice/cpu.rt_runtime_us echo 950000 > /sys/fs/cgroup/cpu,cpuacct/system.slice/cpu.rt_runtime_usorcgset -r cpu.rt_runtime_us=950000 system.slice
- stop and start the database to pick the changes by vktm
check the CLS and PRI, [ CLS is RR (realtime) and priority is 41 ]
-bash-4.2$ ps -eLo 'pid tid cls pri cmd comm cgroup' | egrep 'TID|vktm' |grep -v grepPID TID CLS PRI CMD COMMAND CGROUP 18876 18876 RR 41 ora_vktm_db193cdb ora_vktm_db193c 6:devices:/system.slice/oracle-database.service,5:cpu,cpuacct:/system.slice,1:name=systemd:/system.slice/oracle-database.service 19287 19287 RR 41 ora_vktm_orcl19c ora_vktm_orcl19 6:devices:/system.slice/oracle-database.service,5:cpu,cpuacct:/system.slice,1:name=systemd:/system.slice/oracle-database.service
3.2.5. "nosuid" option :
Please check the mount having the oradism binary is having "nosuid" option set. If having "nosuid" option, please unset this option.
Even if the permissions are proper and kernel configurations are fine, the oradism execution as root user will not be honored by root.
Check the mount option for this case.
3.2.6. Oradism and Oracle binary permission difference:
Sometimes oradism and oracle binary will have different permissions (see bug31909951).
For example, the oracle and oradism will have permissions as below
e5pod-2vzxt1: -rwsr-s–x 1 oracle asmadmin 450717448 Sep 20 22:16 oracle
e5pod-2vzxt1: -rwsr-x— 1 root oinstall 147752 Aug 26 17:13 oradism
Since oracle binary group does not match oradism it fails with error [dism:16]
As a solution, you need to fix oradism to be in the same group as oracle binary.
After setting the above changes, still you are getting ORA-800 then you may be hitting the following bugs.
3.2.7. Apply this patch - Bug 34318125 - ORA-00800: SOFT EXTERNAL ERROR, ARGUMENTS: [SET PRIORITY FAILED] ON BG PROCESSES
ORA-00800: soft external error, arguments: [Set Priority Failed], [VKTM], [Check traces and OS configuration], [Check Oracle document and MOS notes], []
=======
Dump for incident 763248 (ORA 800) ======
[TOC00003]
–— Beginning of Customized Incident Dump(s) –—
ksesethighpri: (ksb.c:9896) Failed to elevate VKTM's priority from 0 to 1, policy 3 >>>>>>>>>>>>>>>>>>>>>>
Error Info: Category(-2), Opname(skgdism_send), Loc(sp.c:setpr:0), ErrMsg(Operation not permitted) Dism(128)
–— End of Customized Incident Dump(s) –—
3.2.8. Apply this merge patch - Bug 34286265 Bug 34318125 Bug 34672698
ORA-00800: soft external error, arguments: [Set Priority Failed], [VKTM],[Check traces and OS configuration], [Check Oracle document and MOS notes],
=======
Dump for incident 1041281 (ORA 800) ======
–— Beginning of Customized Incident Dump(s) –—
ksesethighpri: (ksb.c:9893) Failed to elevate VKTM's priority from 0 to 1, policy 3
Error Info: Category(-2), Opname(skgdism_create), Loc(sp.c:setpr:1), ErrMsg(Error 0) Dism(16)
4. 疑问
- 为啥通过 srvctl start database 不会报错呢?
- 既然cgroup 配置有问题,为啥安装文档里没有要求安装 libcgroup-tools ?
Validate