首页 » ORACLE 9i-23ai » Troubleshooting Query fail with Ora-600 [kdsgrp1] ora-600 [qertbFetchByRowID]

Troubleshooting Query fail with Ora-600 [kdsgrp1] ora-600 [qertbFetchByRowID]

ora-600 [kdsgrp1] 发生fetch 操作查找预期的行失败时, 常有ora-600 [qertbFetchByRowID] 伴随,这次遇到的一个aix 10.2.0.4 rac案例有 ora-7445 [kdr9ir2F0rst4srp0+0068] 和 ora-600 [25027], [6], [825242163] 陪伴, 而且 好几个ora-600 [25027] 的rdba 莫名指向了一些不相干数据块(这点是通过truncate 原表后25027没再出现推断)

The ORA-600 [25027]
ERROR:
Format: ORA-600 [25027] [a] [b]
VERSIONS:
versions 9.2 and above
ARGUMENTS:
Arg [a] Tablespace Number (TSN)
Arg [b] Decimal Relative Data Block Address (RDBA)

The ora-600 [qertbFetchByRowID]
cause Index corruption.

The ora-600 [kdsgrp1]
The ora-600 [kdsgrp1] error is thrown when a fetch operation fails to find the expected row. The error is hit in memory and so may be a memory only error or an error that results from corruption on disk.

This error may indicate (but is not restricted to) any of the following conditions:

1. Lost writes 
2. Parallel DML issues 
3. Index corruption 
4. Data block corruption 
5. Consistent read [CR] issues  
6. Buffer cache corruption  
7. Bugs (A full list of known issues is given in Note 285586.1 - ORA-600 [kdsgrp1] )
8. Storage device issue(ie. EMC SRDF/A DR switchover )
etc..

The following is   ora-600 [kdsgrp1]  trace content.

 
  oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, OLAP, Data Mining
and Real Application Testing options
ORACLE_HOME = /u01/product
System name:	AIX
Node name:	par3a
Release:	3
Version:	5
Machine:	00C45CC44C00
Instance name: zyk1
Redo thread mounted by this instance: 1
Oracle process number: 547
Unix process pid: 2470174, image: oracle@par3a

*** ACTION NAME:() 2014-08-04 10:30:04.677
*** MODULE NAME:(xxxx.exe) 2014-08-04 10:30:04.677
*** SERVICE NAME:(xxx) 2014-08-04 10:30:04.677
*** SESSION ID:(1057.655) 2014-08-04 10:30:04.676
            row 0183c4b8.3 continuation at
            file# 6 block# 246968 slot 4 not found
**************************************************
KDSTABN_GET: 1 ..... ntab: 76
curSlot: 4 ..... nrows: 8194
**************************************************
*** 2014-08-04 10:30:04.685
ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kdsgrp1], [], [], [], [], [], [], []
Current SQL statement for this session:
select BJZDRYBH,CJSJ,...
from a_view
where ...

----- Call Stack Trace -----
calling              call     entry                argument values in hex      
location             type     point                (? means dubious value)     
-------------------- -------- -------------------- ----------------------------
ksedst+001c          bl       ksedst1              000000101 ? 054435000 ?
ksedmp+0290          bl       ksedst               104A2C690 ?
ksfdmp+0018          bl       03F26C3C             
kgerinv+00dc         bl       _ptrgl               
kgeasnmierr+004c     bl       kgerinv              000000000 ? 000000001 ?   2E700000000 ? 000000000 ?   11048AF40 ?
kdsgrp+0460          bl       01F94108             
kdsfbr+0268          bl       kdsgrp               1051B12D8 ? 4330000000000094 ?  1105111C8 ?
qertbFetchByRowID+0  bl       03F276FC9d0                                                
kpofrws+019c         bl       _ptrgl               
opifch2+13dc         bl       03F26538             
opifch+003c          bl       opifch2              700000203CAB6EC ? 000000000 ?  FFFFFFFFFFF9DD0 ?
opiodr+0ae0          bl       _ptrgl               
ttcpip+1020          bl       _ptrgl               
opitsk+1124          bl       01F96AC8             
opiino+0990          bl       opitsk               0FFFFD8C0 ? 000000000 ?
opiodr+0ae0          bl       _ptrgl               
opidrv+0484          bl       01F95914             
sou2o+0090           bl       opidrv               3C02D99B7C ? 4A0145928 ?   FFFFFFFFFFFF7C0 ?
opimai_real+01bc     bl       01F93294             
main+0098            bl       opimai_real          000000000 ? 000000000 ?
__start+0098         bl       main                 000000000 ? 000000000 ?

注意以下信息,从rdba 0183c4b8 或 file#+block#,都指向了这个block的上的第5行数据没有找到(slot start with 0)。通常是索引错误,在索引回表过程中出错。

row 0183c4b8.3 continuation at
file# 6 block# 246968 slot 4 not found

做了以下尝试(部分数据是在本地测试演示)

1, alter system flush buffer_cache;
Re-execute the sql , the error still

2, Disabling rowCR setting _row_cr=FALSE

3, Try to Recreate index

根据问题SQL,确定了执行计划中使用索引回表访问时出错。rebuild index ONLINE 或者drop and recreate the index(es) .( 生产环境需要时间窗口,避开业务期),结果index drop 后,重建报ora-3113 异常中断了。

4, Analyze index xxx validate structure online;
try all index report ORA-1499 or ora-600,

5, Analyze table <table_name> validate structure online;
report ORA-1499 error.

6, Try to Exp and CTAS with set 10231 event, Fails ora-3113.

If analyze and exporting the table (in the presence of chained rows) both report no errors then this should be considered a consistent read issue.

7, Try to skip corrupted block using rowid

# 下面有使用Tanel Poder大师的脚本

sys@ANBOB>@dba 183c4b8
              RFILE#               BLOCK#       BIGFILE_BLOCK#
-------------------- -------------------- --------------------
DUMP_CMD
----------------------------------------------------------------------------------------
                   6               246968             25412792
-- alter system dump datafile 6 block 246968

7.1 Try to dump block report ora-3113 error.

sys@ANBOB>@dba_mk 6 246968
RDBA
--------------------
 0x183C4B8

sys@ANBOB>@dba2 6 246968
OWNER         SEGMENT_NAME       PARTITION_NAME     TABLESPACE_NAME      EXTENT_ID
------------- ------------------ ------------------ ------------------ -----------
anbob         tt                                  USERS                        0

sys@ANBOB>@o2 anbob.tt
owner           object_name        object_typ status            OID     D_OID CREATED             LAST_DDL_TIME
--------------- ------------------ ---------- --------- ----------- --------- ------------------- -------------------
ANBOB           TT                 TABLE      VALID          108933    109805 2014-07-01 14:22:08 2014-07-11 11:38:15

sys@ANBOB>select rowid r,id from anbob.tt;

R                                    ID
------------------ --------------------
AAAaztAAEAAAKbHAAA                    1
AAAaztAAEAAAKbHAAB                    2
AAAaztAAEAAAKbHAAC                    3

sys@ANBOB>@rowid AAAaztAAEAAAKbHAAA
Show file, block, row numbers from rowid AAAaztAAEAAAKbHAAA...

              RFILE#               BLOCK#                 ROW# ROWID_DBA
-------------------- -------------------- -------------------- --------------------
DUMP_COMMAND
----------------------------------------------------------------------------------
                   4                42695                    0  0x100A6C7

alter system dump datafile 4 block 42695; -- @dump 4 42695 .

sys@ANBOB>@rowid_mk 109805 4,42695,0
get rowid data object id 109805
ROWID_STR
------------------
AAAaztAAEAAAKbHAAA

sys@ANBOB>select * from  anbob.tt where rowid='AAAaztAAEAAAKbHAAA';
                  ID T
-------------------- -------------------
                   1 1990-01-02 00:00:00

sys@ANBOB>@rowid_mk 109805 4,42695,2
get rowid data object id 109805
ROWID_STR
------------------
AAAaztAAEAAAKbHAAC

sys@ANBOB>select * from  anbob.tt where rowid='AAAaztAAEAAAKbHAAC';
                  ID T
-------------------- -------------------
                   3 1990-01-04 00:00:00

Note”

到这里我们就可以用以上的方法来跳过corrupted block,甚至可以跳过块上的某行,把损失降到最低。

# if in archivelog mode

create table t_bak nologging as  select * from t where rowid<'';
insert /*+append*/ into t_bak select * from t where rowid>'';

truncate table t;

insert into t select* from t_bak;

TIP:

该列上如果有pk or unique index or 其它index ,可以尝试利用索引来找出corrupted block上的行记录部分信息, 找以前的备份或手工补录。其它方法也可以使用bbed 标注block 状态。

select /*+index(t xxx)*/ COLUMN from t where rowid='';

附上一些相关的sql

[oracle@db231 ~]$ cat dba.sql 
set verify off
col rfile# new_value v_dba_rfile
col block# new_value v_dba_block
col bigfile_block# new_value v_bigfile_block
col dba_object head object for a40 truncate
col dba_DBA head DBA for a20

select 
        dbms_utility.data_block_address_file(to_number('&1','XXXXXXXXXX')) RFILE#, 
        dbms_utility.data_block_address_block(to_number('&1','XXXXXXXXXX')) BLOCK#,
  TO_NUMBER('&1','XXXXXXXXXX') bigfile_block#,
  '-- alter system dump datafile '||dbms_utility.data_block_address_file(to_number('&1','XXXXXXXXXX'))
  ||' block '||dbms_utility.data_block_address_block(to_number('&1','XXXXXXXXXX')) dump_cmd
from dual;

[oracle@db231 ~]$ cat dba2.sql 
col SEGMENT_NAME for a30
col owner for a30
col partition_name for a30
col tablespace_name for a30
select owner, segment_name, partition_name, tablespace_name, extent_id
from dba_extents
where file_id = &1
and &2 between block_id and block_id + blocks - 1;
[oracle@db231 ~]$ cat dba_mk.sql 
set verify off
def fno=&1
def blkno=&2
select  lpad('0x'||trim(to_char(dbms_utility.MAKE_DATA_BLOCK_ADDRESS(&fno, &blkno), 'XXXXXXXX')), 10) rdba from dual;

[oracle@db231 ~]$ cat rowid.sql 
prompt Show file, block, row numbers from rowid &1....
def rowid=&1
select
    dbms_rowid.ROWID_RELATIVE_FNO('&rowid') rfile#
  , dbms_rowid.ROWID_BLOCK_NUMBER('&rowid') block#
  , dbms_rowid.ROWID_ROW_NUMBER('&rowid')   row#
  , lpad('0x'||trim(to_char(dbms_utility.MAKE_DATA_BLOCK_ADDRESS(dbms_rowid.ROWID_RELATIVE_FNO('&rowid')
  , dbms_rowid.ROWID_BLOCK_NUMBER('&rowid')), 'XXXXXXXX')), 10) rowid_dba
  , 'alter system dump datafile '||dbms_rowid.ROWID_RELATIVE_FNO('&rowid')||' block '||
                                   dbms_rowid.ROWID_BLOCK_NUMBER('&rowid')||'; -- @dump '||
                                   dbms_rowid.ROWID_RELATIVE_FNO('&rowid')||' '||
                                   dbms_rowid.ROWID_BLOCK_NUMBER('&rowid')||' .' dump_command
from
    dual
/

[oracle@db231 ~]$ cat rowid_mk.sql 
Prompt get rowid data object id &1
set verify off
def doid=&1
def blk=&2
select  DBMS_ROWID.ROWID_CREATE(1, &doid, &blk) rowid_str from dual;

Quick method to identify table/index mismatch when analyze validate structure cascade takes significant time (文档 ID 1554054.1)

Script to identify TABLE/INDEX inconsistency
Use the next procedure to identify TABLE/INDEX inconsistency, which is based on the above query, for all indexes in a table:

create or replace procedure analyze_quick(owner_table varchar2, name_table varchar2, name_index varchar2 default null)
is
  s varchar2(30000);
  num_indexes number := 0;
  sum_hash number;
begin
  for i in (select a.owner, a.index_name, b.column_name
               from dba_indexes a, dba_ind_columns b
               where a.table_owner = upper(owner_table)
                 and a.table_name  = upper(name_table)
                 and (a.index_name = upper(name_index) or name_index is null)
                 and a.index_type not in ('IOT - TOP'
                                       ,'LOB'
                                       ,'FUNCTION-BASED NORMAL'
                                       ,'FUNCTION-BASED DOMAIN'
                                       ,'CLUSTER')
                 and a.owner = b.index_owner
                 and a.index_name = b.index_name
                 and a.table_name = b.table_name
                 and b.column_position = 1) loop
    num_indexes := num_indexes+1;

    s := 'select /*+ full(t1) parallel */ sum(ora_hash(rowid)) from ';
    s := s || owner_table || '.' || name_table || ' t1 where ' || i.column_name ||' is not null MINUS ';
    s := s || 'select /*+ index_ffs(t '|| i.index_name||') */ sum(ora_hash(rowid)) from ';
    s := s || owner_table || '.' || name_table || ' t where ' || i.column_name ||' is not null';

    begin
      execute immediate s into sum_hash;
      if sum_hash > 0 then
         raise_application_error(-20220,'TABLE/INDEX MISMATCH detected!! Table: '
                                 ||upper(owner_table)||'.'||upper(name_table)
                                 ||' Index: '
                                 ||upper(i.index_name));
      end if;
    exception
      when no_data_found then null;  -- no_data_found means that there is not inconsistency
    end;
  end loop;
  if num_indexes = 0 and name_index is not null then
     raise_application_error(-20221,'Check was not executed. Index '||upper(name_index)||' does not exist for table '||upper(name_table)|| ' or table does not exist');
  elsif num_indexes = 0 then
     raise_application_error(-20222,'Check was not executed. No INDEXES with index_type=NORMAL found for table '||upper(name_table)|| ' or table does not exist');
  end if;

end;
/

SYNTAX

Identify TABLE_NAME inconsistency with all its indexes:

SQL> execute analyze_quick('&TABLE_OWNER','&TABLE_NAME')  

If an inconsistency is detected, the procedure will stop and will not check the rest of indexes for TABLE_NAME.

Identify TABLE_NAME inconsistency with index INDEX_NAME:

SQL> execute analyze_quick('&TABLE_OWNER','&TABLE_NAME', '&INDEX_NAME')

Execution example when TABLE/INDEX inconsistency is identified:

SQL> execute analyze_quick('SCOTT','EMP')
BEGIN analyze_faster('SYS','OBJ$'); END;

*
ERROR at line 1:
ORA-20220: TABLE/INDEX MISMATCH detected!! Table: SCOTT.EMP Index: I_EMP_EMPNO
ORA-06512: at "SYS.ANALYZE_QUICK", line 34
ORA-06512: at line 1

References:
Causes and Solutions for ora-600 [kdsgrp1] (文档 ID 1332252.1)

打赏

,

对不起,这篇文章暂时关闭评论。