Troubleshooting Oracle 11.2.0.4 frequently ORA-8103 on Local non-prefixed index
最近遇到了2个客户出现在11.2.0.4环境中频繁出现ora-8103的问题,基本上都是索引对象object mismatch, 重建后过段时间会再现, 该类问题使用rman validate logical 还无法发现,算是当前oracle软件的一个未知bug.
db alert log
Tue Aug 02 07:12:29 2022 Archived Log entry 701841 added for thread 1 sequence 333804 ID 0x6d3dadc5 dest 1: Tue Aug 02 07:15:00 2022 Errors in file /oracle/app/oracle/diag/rdbms/anbobl/anbobl1/trace/anbobl1_ora_31343.trc: ORA-08103: object no longer exists Tue Aug 02 07:15:00 2022 Errors in file /oracle/app/oracle/diag/rdbms/anbobl/anbobl1/trace/anbobl1_ora_31599.trc: ORA-08103: object no longer exists Tue Aug 02 07:15:01 2022
trace file
*** SESSION ID:(8467.28134) 2022-08-02 10:45:01.797 OBJD MISMATCH typ=6, seg.obj=9603279, diskobj=9603277, dsflg=100000, dsobj=9603279, tid=9603279, cls=1 *** 2022-08-02 10:45:01.798 dbkedDefDump(): Starting a non-incident diagnostic dump (flags=0x0, level=3, mask=0x0) ----- Error Stack Dump ----- ORA-08103: object no longer exists ----- Current SQL Statement for this session (sql_id=4np3fkhwcwv1z) ----- update u1.xxx set APPLYOID='88872xxxx' where SUBSID='31722xxxxx6' and DAYID='20220802' call stack -------------- kgeselv ksesecl0 kcbz_check_objd_typ kcbzib kcbgtcr ktrget3 ktrget2 kdsgrp qetlbr
Note:
From the above we can tell the following:
data segment object_id = 9603279
On disk DATA_OBJECT_ID (diskobj) = 9603277
In memory buffer DATA_OBJECT_ID (dsobj) = 9603279
查看SQL的执行计划
explian plan for update u1.xxx from u1.xxx tset APPLYOID='88872xxxx' where SUBSID='31722xxxxx6' and DAYID='20220802'; Plan hash value: 3749668329 ------------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart| Pstop | ------------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 32 | 6 (0)| 00:00:01 | | | | 1 | PARTITION RANGE SINGLE | | 1 | 32 | 6 (0)| 00:00:01 | 31 | 31 | |* 2 | TABLE ACCESS BY LOCAL INDEX ROWID| TC_Sxxxxxxxxxxxxxxxx | 1 | 32 | 6 (0)| 00:00:01 | 31 | 31 | |* 3 | INDEX RANGE SCAN | IDX_xxxxxxxxxxxxxxxxxxxT | 2 | | 3 (0)| 00:00:01 | 31 | 31 | ------------------------------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - filter("DAYID"=20220802) 3 - access("SUBSID"='31722xxxx6')
note:
可以确认是使用了index range scan, Local索引分区的#31,回表分区表#31。 该分区是20220802.
该表是个基于DAYID创建的每天一个分区, 只在subsid列创建了local 索引(未带分区列). data_object_id 9603277和9603279分别对应的是表段 20220801和20220802 分区。 从dba_objects中查看对象OBJ#=OBJD#,CREATED=LAST_DDL_TIME,判断相关对象没有做过truncate或交换分区。
判断表中是否有该记录?
select /*+full(t)*/ * from u1.xxx t where SUBSID='31722xxxxx6' and DAYID='20220802'
使用full sql hint查询table segment中确实无该记录返回。
确认索引KEY记录的位置
select distinct dbms_rowid.rowid_object(rowid) dsobj, dbms_rowid.rowid_relative_fno(rowid) file#, dbms_rowid.rowid_block_number(rowid) blk#, dbms_rowid.rowid_row_number(rowid) row# from u1.xxx tset APPLYOID='88872xxxx' where SUBSID='31722xxxxx6' and distinct dbms_rowid.rowid_object(rowid)=9603279;
Note:
可以从记录中找到指向9603279# 的rowid 的数据块地址。
确认数据块指向的对象
select owner, segment_name, partition_name, tablespace_name from dba_extents where relative_fno = &v_dba_rfile and &v_dba_block between block_id and block_id + blocks - 1 /
从上面的文件号和块号 确认了数据段指向了9603277#, 到这里可以确认的索引 31#分区(20220802) 通过key返回的ROWID指向了表段并不是31#表分区(DATA_OBJECT_ID 9603279#),而是30#分区表20220801(DATA_OBJECT_ID =9603277#)。 不符合local partition。
确认表分区20220801中是否有上面索引中记录
可以dump 上面rowid 返回的文件号, 块号 -- or 查询20220801分区,指定partition 关键字 ,确认使用分区索引 -- dump block sys@ANBOBL>alter system dump datafile xxx block xxx; tab 0, row 5, @0x9f3 tl: 65 fb: --H-FL-- lb: 0x0 cc: 7 col 0: [13] 33 31 37 32 32 37 39 38 37 37 33 38 36 sys@ANBOBL> select UTL_RAW.CAST_TO_VARCHAR2(replace('33 31 37 32 32 37 39 38 37 37 33 38 36',' ','')) from dual; UTL_RAW.CAST_TO_VARCHAR2(REPLACE('33313732323739383737333836','','')) ------------------------------------------------------------------------------- 3172279877386
note:
确认表段上数据确实存在。
确认索引分区20220801中是否有上面索引中记录
select distinct dbms_rowid.rowid_object(rowid) dsobj, dbms_rowid.rowid_relative_fno(rowid) file#, dbms_rowid.rowid_block_number(rowid) blk#, dbms_rowid.rowid_row_number(rowid) row# from u1.xxx partition(p20220801) t where SUBSID='31722xxxxx6'
Note:
20220801 索引分区中,无分区表20220801中 KEY对应的记录。
之前这个问题已排除过_part_access_version_by_number=false, 延迟段创建、交换分区特性。
到这里总结一下,分区表20220801中的一个KEY值 在对应的local 索引分区20220801中缺失,错误的记录到了索引分区20220802中。
查询alert log中历史错误对象
grep -B 1 "ORA-08103" alert*.log|egrep '^Errors in file'|awk '{print $4}'|sed 's/://'|while read LINE ; do echo $LINE; grep "ORA-08103" -A 2 $LINE; done;
Note:
可以确认发生在该相同的表上2个索引, 索引同时都是不带分区列的非前缀本地索引。每次临时解决都可以重建相邻的2个问题索引分区;
该问题应该属于ORACLE 11.2.0.4的未知bug. 彻底解决可以修改创建 Local prefixed index
对不起,这篇文章暂时关闭评论。