Troubleshooting Query fail with Ora-600 [kdsgrp1] ora-600 [qertbFetchByRowID]
ora-600 [kdsgrp1] 发生fetch 操作查找预期的行失败时, 常有ora-600 [qertbFetchByRowID] 伴随,这次遇到的一个aix 10.2.0.4 rac案例有 ora-7445 [kdr9ir2F0rst4srp0+0068] 和 ora-600 [25027], [6], [825242163] 陪伴, 而且 好几个ora-600 [25027] 的rdba 莫名指向了一些不相干数据块(这点是通过truncate 原表后25027没再出现推断)
The ORA-600 [25027]
ERROR:
Format: ORA-600 [25027] [a] [b]
VERSIONS:
versions 9.2 and above
ARGUMENTS:
Arg [a] Tablespace Number (TSN)
Arg [b] Decimal Relative Data Block Address (RDBA)
The ora-600 [qertbFetchByRowID]
cause Index corruption.
The ora-600 [kdsgrp1]
The ora-600 [kdsgrp1] error is thrown when a fetch operation fails to find the expected row. The error is hit in memory and so may be a memory only error or an error that results from corruption on disk.
This error may indicate (but is not restricted to) any of the following conditions:
1. Lost writes
2. Parallel DML issues
3. Index corruption
4. Data block corruption
5. Consistent read [CR] issues
6. Buffer cache corruption
7. Bugs (A full list of known issues is given in Note 285586.1 - ORA-600 [kdsgrp1] )
8. Storage device issue(ie. EMC SRDF/A DR switchover )
etc..
The following is ora-600 [kdsgrp1] trace content.
oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /u01/product
System name: AIX
Node name: par3a
Release: 3
Version: 5
Machine: 00C45CC44C00
Instance name: zyk1
Redo thread mounted by this instance: 1
Oracle process number: 547
Unix process pid: 2470174, image: oracle@par3a
*** ACTION NAME:() 2014-08-04 10:30:04.677
*** MODULE NAME:(xxxx.exe) 2014-08-04 10:30:04.677
*** SERVICE NAME:(xxx) 2014-08-04 10:30:04.677
*** SESSION ID:(1057.655) 2014-08-04 10:30:04.676
row 0183c4b8.3 continuation at
file# 6 block# 246968 slot 4 not found
**************************************************
KDSTABN_GET: 1 ..... ntab: 76
curSlot: 4 ..... nrows: 8194
**************************************************
*** 2014-08-04 10:30:04.685
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Current SQL statement for this session:
select BJZDRYBH,CJSJ,...
from a_view
where ...
----- Call Stack Trace -----
calling call entry argument values in hex
location type point (? means dubious value)
-------------------- -------- -------------------- ----------------------------
ksedst+001c bl ksedst1 000000101 ? 054435000 ?
ksedmp+0290 bl ksedst 104A2C690 ?
ksfdmp+0018 bl 03F26C3C
kgerinv+00dc bl _ptrgl
kgeasnmierr+004c bl kgerinv 000000000 ? 000000001 ? 2E700000000 ? 000000000 ? 11048AF40 ?
kdsgrp+0460 bl 01F94108
kdsfbr+0268 bl kdsgrp 1051B12D8 ? 4330000000000094 ? 1105111C8 ?
qertbFetchByRowID+0 bl 03F276FC9d0
kpofrws+019c bl _ptrgl
opifch2+13dc bl 03F26538
opifch+003c bl opifch2 700000203CAB6EC ? 000000000 ? FFFFFFFFFFF9DD0 ?
opiodr+0ae0 bl _ptrgl
ttcpip+1020 bl _ptrgl
opitsk+1124 bl 01F96AC8
opiino+0990 bl opitsk 0FFFFD8C0 ? 000000000 ?
opiodr+0ae0 bl _ptrgl
opidrv+0484 bl 01F95914
sou2o+0090 bl opidrv 3C02D99B7C ? 4A0145928 ? FFFFFFFFFFFF7C0 ?
opimai_real+01bc bl 01F93294
main+0098 bl opimai_real 000000000 ? 000000000 ?
__start+0098 bl main 000000000 ? 000000000 ?
注意以下信息,从rdba 0183c4b8 或 file#+block#,都指向了这个block的上的第5行数据没有找到(slot start with 0)。通常是索引错误,在索引回表过程中出错。
row 0183c4b8.3 continuation at file# 6 block# 246968 slot 4 not found
做了以下尝试(部分数据是在本地测试演示)
1, alter system flush buffer_cache;
Re-execute the sql , the error still
2, Disabling rowCR setting _row_cr=FALSE
3, Try to Recreate index
根据问题SQL,确定了执行计划中使用索引回表访问时出错。rebuild index ONLINE 或者drop and recreate the index(es) .( 生产环境需要时间窗口,避开业务期),结果index drop 后,重建报ora-3113 异常中断了。
4, Analyze index xxx validate structure online;
try all index report ORA-1499 or ora-600,
5, Analyze table <table_name> validate structure online;
report ORA-1499 error.
6, Try to Exp and CTAS with set 10231 event, Fails ora-3113.
If analyze and exporting the table (in the presence of chained rows) both report no errors then this should be considered a consistent read issue.
7, Try to skip corrupted block using rowid
# 下面有使用Tanel Poder大师的脚本
sys@ANBOB>@dba 183c4b8 RFILE# BLOCK# BIGFILE_BLOCK# -------------------- -------------------- -------------------- DUMP_CMD ---------------------------------------------------------------------------------------- 6 246968 25412792 -- alter system dump datafile 6 block 246968 7.1 Try to dump block report ora-3113 error. sys@ANBOB>@dba_mk 6 246968 RDBA -------------------- 0x183C4B8 sys@ANBOB>@dba2 6 246968 OWNER SEGMENT_NAME PARTITION_NAME TABLESPACE_NAME EXTENT_ID ------------- ------------------ ------------------ ------------------ ----------- anbob tt USERS 0 sys@ANBOB>@o2 anbob.tt owner object_name object_typ status OID D_OID CREATED LAST_DDL_TIME --------------- ------------------ ---------- --------- ----------- --------- ------------------- ------------------- ANBOB TT TABLE VALID 108933 109805 2014-07-01 14:22:08 2014-07-11 11:38:15 sys@ANBOB>select rowid r,id from anbob.tt; R ID ------------------ -------------------- AAAaztAAEAAAKbHAAA 1 AAAaztAAEAAAKbHAAB 2 AAAaztAAEAAAKbHAAC 3 sys@ANBOB>@rowid AAAaztAAEAAAKbHAAA Show file, block, row numbers from rowid AAAaztAAEAAAKbHAAA... RFILE# BLOCK# ROW# ROWID_DBA -------------------- -------------------- -------------------- -------------------- DUMP_COMMAND ---------------------------------------------------------------------------------- 4 42695 0 0x100A6C7 alter system dump datafile 4 block 42695; -- @dump 4 42695 . sys@ANBOB>@rowid_mk 109805 4,42695,0 get rowid data object id 109805 ROWID_STR ------------------ AAAaztAAEAAAKbHAAA sys@ANBOB>select * from anbob.tt where rowid='AAAaztAAEAAAKbHAAA'; ID T -------------------- ------------------- 1 1990-01-02 00:00:00 sys@ANBOB>@rowid_mk 109805 4,42695,2 get rowid data object id 109805 ROWID_STR ------------------ AAAaztAAEAAAKbHAAC sys@ANBOB>select * from anbob.tt where rowid='AAAaztAAEAAAKbHAAC'; ID T -------------------- ------------------- 3 1990-01-04 00:00:00
Note”
到这里我们就可以用以上的方法来跳过corrupted block,甚至可以跳过块上的某行,把损失降到最低。
# if in archivelog mode create table t_bak nologging as select * from t where rowid<''; insert /*+append*/ into t_bak select * from t where rowid>''; truncate table t; insert into t select* from t_bak;
TIP:
该列上如果有pk or unique index or 其它index ,可以尝试利用索引来找出corrupted block上的行记录部分信息, 找以前的备份或手工补录。其它方法也可以使用bbed 标注block 状态。
select /*+index(t xxx)*/ COLUMN from t where rowid='';
附上一些相关的sql
[oracle@db231 ~]$ cat dba.sql set verify off col rfile# new_value v_dba_rfile col block# new_value v_dba_block col bigfile_block# new_value v_bigfile_block col dba_object head object for a40 truncate col dba_DBA head DBA for a20 select dbms_utility.data_block_address_file(to_number('&1','XXXXXXXXXX')) RFILE#, dbms_utility.data_block_address_block(to_number('&1','XXXXXXXXXX')) BLOCK#, TO_NUMBER('&1','XXXXXXXXXX') bigfile_block#, '-- alter system dump datafile '||dbms_utility.data_block_address_file(to_number('&1','XXXXXXXXXX')) ||' block '||dbms_utility.data_block_address_block(to_number('&1','XXXXXXXXXX')) dump_cmd from dual; [oracle@db231 ~]$ cat dba2.sql col SEGMENT_NAME for a30 col owner for a30 col partition_name for a30 col tablespace_name for a30 select owner, segment_name, partition_name, tablespace_name, extent_id from dba_extents where file_id = &1 and &2 between block_id and block_id + blocks - 1; [oracle@db231 ~]$ cat dba_mk.sql set verify off def fno=&1 def blkno=&2 select lpad('0x'||trim(to_char(dbms_utility.MAKE_DATA_BLOCK_ADDRESS(&fno, &blkno), 'XXXXXXXX')), 10) rdba from dual; [oracle@db231 ~]$ cat rowid.sql prompt Show file, block, row numbers from rowid &1.... def rowid=&1 select dbms_rowid.ROWID_RELATIVE_FNO('&rowid') rfile# , dbms_rowid.ROWID_BLOCK_NUMBER('&rowid') block# , dbms_rowid.ROWID_ROW_NUMBER('&rowid') row# , lpad('0x'||trim(to_char(dbms_utility.MAKE_DATA_BLOCK_ADDRESS(dbms_rowid.ROWID_RELATIVE_FNO('&rowid') , dbms_rowid.ROWID_BLOCK_NUMBER('&rowid')), 'XXXXXXXX')), 10) rowid_dba , 'alter system dump datafile '||dbms_rowid.ROWID_RELATIVE_FNO('&rowid')||' block '|| dbms_rowid.ROWID_BLOCK_NUMBER('&rowid')||'; -- @dump '|| dbms_rowid.ROWID_RELATIVE_FNO('&rowid')||' '|| dbms_rowid.ROWID_BLOCK_NUMBER('&rowid')||' .' dump_command from dual / [oracle@db231 ~]$ cat rowid_mk.sql Prompt get rowid data object id &1 set verify off def doid=&1 def blk=&2 select DBMS_ROWID.ROWID_CREATE(1, &doid, &blk) rowid_str from dual;
Quick method to identify table/index mismatch when analyze validate structure cascade takes significant time (文档 ID 1554054.1)
Script to identify TABLE/INDEX inconsistency
Use the next procedure to identify TABLE/INDEX inconsistency, which is based on the above query, for all indexes in a table:
create or replace procedure analyze_quick(owner_table varchar2, name_table varchar2, name_index varchar2 default null) is s varchar2(30000); num_indexes number := 0; sum_hash number; begin for i in (select a.owner, a.index_name, b.column_name from dba_indexes a, dba_ind_columns b where a.table_owner = upper(owner_table) and a.table_name = upper(name_table) and (a.index_name = upper(name_index) or name_index is null) and a.index_type not in ('IOT - TOP' ,'LOB' ,'FUNCTION-BASED NORMAL' ,'FUNCTION-BASED DOMAIN' ,'CLUSTER') and a.owner = b.index_owner and a.index_name = b.index_name and a.table_name = b.table_name and b.column_position = 1) loop num_indexes := num_indexes+1; s := 'select /*+ full(t1) parallel */ sum(ora_hash(rowid)) from '; s := s || owner_table || '.' || name_table || ' t1 where ' || i.column_name ||' is not null MINUS '; s := s || 'select /*+ index_ffs(t '|| i.index_name||') */ sum(ora_hash(rowid)) from '; s := s || owner_table || '.' || name_table || ' t where ' || i.column_name ||' is not null'; begin execute immediate s into sum_hash; if sum_hash > 0 then raise_application_error(-20220,'TABLE/INDEX MISMATCH detected!! Table: ' ||upper(owner_table)||'.'||upper(name_table) ||' Index: ' ||upper(i.index_name)); end if; exception when no_data_found then null; -- no_data_found means that there is not inconsistency end; end loop; if num_indexes = 0 and name_index is not null then raise_application_error(-20221,'Check was not executed. Index '||upper(name_index)||' does not exist for table '||upper(name_table)|| ' or table does not exist'); elsif num_indexes = 0 then raise_application_error(-20222,'Check was not executed. No INDEXES with index_type=NORMAL found for table '||upper(name_table)|| ' or table does not exist'); end if; end; / SYNTAX Identify TABLE_NAME inconsistency with all its indexes: SQL> execute analyze_quick('&TABLE_OWNER','&TABLE_NAME') If an inconsistency is detected, the procedure will stop and will not check the rest of indexes for TABLE_NAME. Identify TABLE_NAME inconsistency with index INDEX_NAME: SQL> execute analyze_quick('&TABLE_OWNER','&TABLE_NAME', '&INDEX_NAME') Execution example when TABLE/INDEX inconsistency is identified: SQL> execute analyze_quick('SCOTT','EMP') BEGIN analyze_faster('SYS','OBJ$'); END; * ERROR at line 1: ORA-20220: TABLE/INDEX MISMATCH detected!! Table: SCOTT.EMP Index: I_EMP_EMPNO ORA-06512: at "SYS.ANALYZE_QUICK", line 34 ORA-06512: at line 1
References:
Causes and Solutions for ora-600 [kdsgrp1] (文档 ID 1332252.1)
对不起,这篇文章暂时关闭评论。