首页 » ORACLE 9i-23ai » Troubleshooting failed to generate AWR snapshots at WRH$_SERVICE_STAT(gv$service_stats many ‘–UNKNOWN–‘)

Troubleshooting failed to generate AWR snapshots at WRH$_SERVICE_STAT(gv$service_stats many ‘–UNKNOWN–‘)

还是上次那个大量的expdp JOB数据库环境Oracle 11.2.0.4  4-nodes RAC, 查看库的AWR时发现已经很久没有生成snapshot, 分析发现AWR内部在收集gv$service_stats(X$KEWSSVCV)环节超时,因为X$KEWSSVCV 有百万行”–UNKNOWN–” service name的统计信息记录,虽然这个问题是因为datapump 的”BUG”((non-existent on MOS) )导致, 但这算是一个数据库长期不重启带来的“不好的影响”。

# db alert log

Thu May 07 22:09:20 2020
Suspending MMON slave action kewrmafsa_ for 82800 seconds

Note:
说明mmon可能暂停23个小时了。

# MMON trace

*** KEWROCISTMTEXEC - encountered error: (ORA-12751: cpu time or run time policy violation) 
 *** SQLSTR: total-len=295, dump-len=240, 
 STR={insert into WRH$_SERVICE_STAT (snap_id, dbid, instance_number, service_name_hash, stat_id, value) 
select :snap_id, :dbid, :instance_number, stat.service_name_hash, stat.stat_id, stat.value 
from v$active_services asvc, v$service_st}
DDE rules only execution for: ORA 12751
----- START Event Driven Actions Dump ----
---- END Event Driven Actions Dump ----
----- START DDE Actions Dump -----
Executing SYNC actions
Executing ASYNC actions
----- START DDE Action: 'ORA_12751_DUMP' (Sync) -----
CPU time exceeded 300 seconds
Time limit violation detected at:
ksedsts()+465<-kspol_12751_dump()+145<-dbgdaExecuteAction()+1065<-dbgerRunAction()+109<-dbgerRunActions()+4134<-dbgexPhaseII()+1873<-dbgexProcessError()+2680<-dbgeExecuteForError()+88<-dbgePostErrorKGE()+2136<-dbkePostKGE_kgsf()+71<-kgeade()+351<-kgerelv()+140
<-kgerev()+34<-kserec1()+170<-OCIKSEC()+189<-kewrose_oci_stmt_exec()+292<-kewrgwxf1_gwrsql_exft_1()+317<-kewrgwxf_gwrsql_exft()+496<-kewrews_execute_wr_sql()+52<-kewrftbs_flush_table_by_sql()+180<-kewrft_flush_table()+264<-kewrftec_flush_table_ehdlcx()+766
<-kewrfsvc_flush_svcstat()+45<-kewrft_flush_table()+397<-kewrftec_flush_table_ehdlcx()+766<-kewrfat_flush_all_tables()+898<-kewrfos_flush_onesnap()+170<-kewrfsc_flush_snapshot_c()+644<-kewrafs_auto_flush_slave()+769<-kebm_slave_main()+586<-ksvrdp()+1766<-opirip()+674
<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+230
Current Wait Stack:
 Not in wait; last wait ended 5 min 5 sec ago 
There are 3 sessions blocked by this session.

Note:
Taking an AWR snapshot executes many SQL statements to generate a new snapshot and has a built-in timeout mechanism, which causes timeouts between 300 and 900 seconds by default. As a result, if snapshot generation stalls, “Time limit violation” or “Runtime exceeded” messages and associated ORA-12751 errors are raised.

如果注意手动执行dbms_workload_repository.create_snapshot,也会因为内部的超时限制而被kill,终止当前操作。

禁用MMON对于CPU保护,使用下面的SQL, 但是不建议这么做


ALTER SYSTEM SET "_awr_mmon_cpuusage" = FALSE SCOPE=BOTH;

设置参数后,请尽快重新启动数据库以重新启用 MMON 进程,否则 AWR 快照生成将要自上次超时以来的 23 小时(82800 秒)后才能恢复。对于 RAC 数据库,请一次重新启动每个实例,直到所有实例都已重新启动。

查看SQL的执行计划

SQL_ID  7g732rx16j8jc, child number 3
-------------------------------------
insert into WRH$_SERVICE_STAT   (snap_id, dbid, instance_number,
service_name_hash, stat_id, value) select   :snap_id, :dbid,
:instance_number,   stat.service_name_hash, stat.stat_id, stat.value
from  v$active_services asvc, v$service_stats stat where
asvc.name_hash = stat.service_name_hash

Plan hash value: 2550932738

---------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                 | Name               | Starts | E-Rows | A-Rows |   A-Time   |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT          |                    |      1 |        |      0 |00:00:00.01 |       |       |          |
|   1 |  LOAD TABLE CONVENTIONAL  |                    |      1 |        |      0 |00:00:00.01 |       |       |          |
|   2 |   NESTED LOOPS            |                    |      1 |      1 |    174 |00:07:25.64 |       |       |          | 
|   3 |    MERGE JOIN CARTESIAN   |                    |      1 |      1 |    175 |00:00:00.01 |       |       |          |
|*  4 |     FIXED TABLE FULL      | X$KSWSASTAB        |      1 |      1 |      7 |00:00:00.01 |       |       |          |
|   5 |     BUFFER SORT           |                    |      7 |      1 |    175 |00:00:00.01 | 73728 | 73728 |          |
|*  6 |      FIXED TABLE FULL     | X$KEWSSMAP         |      1 |      1 |     28 |00:00:00.01 |       |       |          |
|*  7 |    FIXED TABLE FIXED INDEX| X$KEWSSVCV (ind:2) |    175 |      1 |    174 |00:07:42.11 |       |       |          |
---------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   4 - filter(("KSWSASTABACT"=1 AND "INST_ID"=USERENV('INSTANCE')))
   6 - filter("M"."AGGID"=3)
   7 - filter(("S"."INST_ID"=USERENV('INSTANCE') AND "KSWSASTABNMH"="S"."SVCHSH" AND "S"."KEWSOFF"="M"."OFFST"))

Note:
看的出执行计划的7# 很少的数据在数据在和X$KEWSSVCV 做NLOOP时花费了7分钟多的时间。且估算的3个X$ FIXED TABLE也都很少的数据。

查看统计信息

SQL> select * from DBA_TAB_STATISTICS where table_name in    ('X$KSWSASTAB','X$KEWSSMAP','X$KEWSSVCV');

OWNER    TABLE_NAME     OBJECT_TYPE    NUM_ROWS     BLOCKS EMPTY_BLOCKS  AVG_SPACE  LAST_ANALYZED     GLO USE STATT STA
-------- -------------- ------------ ---------- ---------- ------------ ----------  ----------------- --- --- ----- ---
SYS      X$KSWSASTAB    FIXED TABLE
SYS      X$KEWSSMAP     FIXED TABLE
SYS      X$KEWSSVCV     FIXED TABLE

收集统计信息

SQL> select count(*) from X$KSWSASTAB;
  COUNT(*)
----------
         8

SQL> select count(*) from X$KEWSSMAP;
  COUNT(*)
----------
       265

SQL> select count(*) from X$KEWSSVCV;
  COUNT(*)
----------
   2576336

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('SYS', 'X$KSWSASTAB', no_invalidate=>false);
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('SYS', 'X$KEWSSMAP', no_invalidate=>false);
SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS('SYS', 'X$KEWSSVCV', no_invalidate=>false);

SQL> @sqlid 7g732rx16j8jc %
Show SQL text, child cursors and execution stats for SQLID 7g732rx16j8jc child %

HASH_VALUE PLAN_HASH_VALUE  CH# SQL_TEXT
---------- --------------- ---- ------------------------------------------------------------------------------------------------------------------------------------------------------
1114153516      4168499536    0 insert into WRH$_SERVICE_STAT   (snap_id, dbid, instance_number,    service_name_hash, stat_id, value) select   :snap_id, :dbid, :instance_number,
                                stat.service_name_hash, stat.stat_id, stat.value from  v$active_services asvc, v$service_stats stat where  asvc.name_hash = stat.service_name_hash

1114153516      4168499536    2 insert into WRH$_SERVICE_STAT   (snap_id, dbid, instance_number,    service_name_hash, stat_id, value) select   :snap_id, :dbid, :instance_number,
                                stat.service_name_hash, stat.stat_id, stat.value from  v$active_services asvc, v$service_stats stat where  asvc.name_hash = stat.service_name_hash



 CH# PARENT_HANDLE    OBJECT_HANDLE     PLAN_HASH     PARSES   H_PARSES EXECUTIONS    FETCHES ROWS_PROCESSED ROWS_PER_FETCH    CPU_SEC CPU_SEC_EXEC    ELA_SEC       LIOS       PIOS      SORTS USERS_EXECUTING
---- ---------------- ---------------- ---------- ---------- ---------- ---------- ---------- -------------- -------------- ---------- ------------ ---------- ---------- ---------- ---------- ---------------
   0 00000000BE2A17D0 0000000101E08018 4168499536         22          8         22          0           4256                   .272018   .012364455    .630342       1040        138          0               0
   2 00000000BE2A17D0 00000000BD655CC0 4168499536          1          2          1          0            224                   .016001      .016001    .056783         77         10          0               0

SQL_ID  7g732rx16j8jc, child number 2
-------------------------------------
insert into WRH$_SERVICE_STAT   (snap_id, dbid, instance_number,
service_name_hash, stat_id, value) select   :snap_id, :dbid,
:instance_number,   stat.service_name_hash, stat.stat_id, stat.value
from  v$active_services asvc, v$service_stats stat where
asvc.name_hash = stat.service_name_hash

Plan hash value: 4168499536

---------------------------------------------------------------------------------------------
| Id  | Operation                  | Name               | E-Rows |  OMem |  1Mem | Used-Mem |
---------------------------------------------------------------------------------------------
|   0 | INSERT STATEMENT           |                    |        |       |       |          |
|   1 |  LOAD TABLE CONVENTIONAL   |                    |        |       |       |          |
|*  2 |   HASH JOIN                |                    |    166 |   965K|   965K| 1081K (0)|
|   3 |    NESTED LOOPS            |                    |    212 |       |       |          |
|*  4 |     FIXED TABLE FULL       | X$KSWSASTAB        |      8 |       |       |          |
|*  5 |     FIXED TABLE FIXED INDEX| X$KEWSSVCV (ind:1) |     28 |       |       |          |
|*  6 |    FIXED TABLE FULL        | X$KEWSSMAP         |     28 |       |       |          |
---------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("S"."KEWSOFF"="M"."OFFST")
   4 - filter(("KSWSASTABACT"=1 AND "INST_ID"=USERENV('INSTANCE')))
   5 - filter(("S"."INST_ID"=USERENV('INSTANCE') AND "KSWSASTABNMH"="S"."SVCHSH"))
   6 - filter("M"."AGGID"=3)

Note:
收集统计信息后,SQL执行非常快执行计划改变。

查看FIXED TABLE INDEX

SQL> select * from v$indexed_fixed_column where table_name='X$KEWSSVCV' ;

TABLE_NAME                     INDEX_NUMBER COLUMN_NAME                    COLUMN_POSITION
------------------------------ ------------ ------------------------------ ---------------
X$KEWSSVCV                                2 KEWSOFF                                      0
X$KEWSSVCV                                1 SVCHSH                                       0

Note:
Fixed index在dba_indexes中查询不到,它是一种内存中的C结构对象,索引名是以编号形式存在,如ind:2 , ind:1 .., 收集统计信息前访问X$KEWSSVCV使用的是2号索引,在KEWSOFF列上的索引,该列是大量重复的序列,选择率并不好, 而收集统计信息后使用了1号索引(svchsh列),该列像是service hash值,唯一性特别好。使用1号索引查看SQL的平均执行时间在毫秒级。

检查MMON 是否挂起

SQL> oradebug unit_test kebm_dmp_slv_attrs kewrmafsa_
Status: 3
Flags: 0
Runtime limit: 900
CPU time limit: 300
Violations: 28
Suspended until: 1589036765  <<<---- not zero indicates that MMON is still suspended.


SQL> EXECUTE DBMS_WORKLOAD_REPOSITORY.CREATE_SNAPSHOT();
PL/SQL procedure successfully completed.

Note:
这时再次手动生成AWR已经可以,耗时可以接受,下面清除MMON的suspended状态

SQL> oradebug unit_test kebm_set_slv_attrs kewrmafsa_ retain retain retain retain 0 0
Modified attributes of kewrmafsa_ (slave id 13)

SQL> oradebug unit_test kebm_dmp_slv_attrs kewrmafsa_
Status: 3
Flags: 0
Runtime limit: 900
CPU time limit: 300
Violations: 0
Suspended until: 0

至此问题已经解决,AWR已经可以自动生成,再看看gv$service_stats(X$KEWSSVCV)那么大是什么原因。 Automatic Workload Repository (kewStatistics SerViCV ? (I guess)

SQL> select inst_id,service_name,count(1) from gv$service_stats
  2   group by inst_id,service_name
  3  order by 1,2,count(1) desc;

   INST_ID SERVICE_NAME                                                       COUNT(1)
---------- ---------------------------------------------------------------- ----------
         1 --UNKNOWN--                                                         2576100
         1 SYS$BACKGROUND                                                           28
         1 SYS$USERS                                                                28
         1 SYS.KUPC$C_1_20200413145208                                              27
         1 SYS.KUPC$C_1_20200413152721                                              28
         1 SYS.KUPC$C_1_20200509102215                                               7
         1 SYS.KUPC$S_1_20200413145208                                              28
         1 SYS.KUPC$S_1_20200413152721                                              27
         1 SYS.KUPC$S_1_20200509102215                                               8
         1 cdr.hebei.mobile.com                                                     28
         1 cdrXDB                                                                   27
         2 --UNKNOWN--                                                              56
         2 SYS$BACKGROUND                                                           28
         2 SYS$USERS                                                                28
         2 cdr.hebei.mobile.com                                                     28
         2 cdrXDB                                                                   28
         3 --UNKNOWN--                                                              56
         3 SYS$BACKGROUND                                                           28
         3 SYS$USERS                                                                28
         3 cdr.hebei.mobile.com                                                     28
         3 cdrXDB                                                                   28
         4 --UNKNOWN--                                                             112
         4 SYS$BACKGROUND                                                           28
         4 SYS$USERS                                                                28
         4 cdr.hebei.mobile.com                                                     28
         4 cdrXDB                                                                   28

这是个4节点的RAC, 因为EXPDP导出任务都在节点1, 所以X$KEWSSVCV 内存表记录在节点1非常明显, X$KEWSSVCV中的”–UNKNOWN–” 服务的问题(没有记录在MOS中的BUG)是由expdp每次执行结束后插入,测试该问题在19c中都一直存在, 并且X$ FIXED TABLE只允许查询不允许任何DML,即使是SYSDBA也不可以,所以如果上面收集了统计信息后问题依旧存在(即使使用了正常的索引)或SQL性能慢的不可接受,建议重启实例释放X$ fixed table.

打赏

,

对不起,这篇文章暂时关闭评论。