首页 » ORACLE 9i-23ai » ORA-600 [kdBlkCheckError][X],[X],[38504] and ORA-600[4194],[],[] in 11.2.0.4

ORA-600 [kdBlkCheckError][X],[X],[38504] and ORA-600[4194],[],[] in 11.2.0.4

ORA-600 [kdBlkCheckError][X],[X],[38504] and ORA-600[4194],[],[] in 11.2.0.4

Symptom:
The Oracle Database is crashing down in few minutes as soon as we start the Database. DB 11.2.0.4 in linux single instance . it is in  VMWare 6, Done a dynamic disk allocation  before the problem occurred . then check alert log found ORA-600 [kdBlkCheckError] and ORA-600[4194] errors.

SUGGESTIONS:
ORA-600 [kdBlkCheckError]
Kernel Data Block Check Error, When logical corrupted data blocks  is detected ,Normally this oracle bug or memory corrupt.

ORA-600 [4194] [a] [b]

VERSIONS:
versions 6.0 to 10.1
DESCRIPTION:
A mismatch has been detected between Redo records and rollback (Undo)   records.
We are validating the Undo record number relating to the change being  applied against the maximum undo record number recorded in the undo block. This error is reported when the validation fails.

ARGUMENTS:
Arg [a] Maximum Undo record number in Undo block
Arg [b] Undo record number from Redo block

FUNCTIONALITY:
Kernel Transaction Undo called from Cache layer

Note in the case ora-600 [4194] arguments A and B is null, It seems like your undo tablespace is corrupted.In rare cases (usually DBA error) the Oracle UNDO tablespace can become corrupted.  This manifests with this error:
ORA-00376: file xx cannot be read at this time

Alert log
=======================

Errors in file /oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_3058.trc  (incident=19319):
ORA-00600: internal error code, arguments: [kdBlkCheckError], [3], [224], [38504], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/orcl/orcl/incident/incdir_19319/orcl_smon_3058_i19319.trc
replication_dependency_tracking turned off (no async multimaster replication found)
Starting background process QMNC
Mon Sep 01 15:24:13 2014
QMNC started with pid=22, OS id=3091
Completed: alter database open
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.
Block recovery from logseq 123, block 58 to scn 2959315
Recovery of Online Redo Log: Thread 1 Group 3 Seq 123 Reading mem 0
Mem# 0: /oracle/oradata/orcl/redo03.log
Block recovery completed at rba 123.103.16, scn 0.2959316
Errors in file /oracle/diag/rdbms/orcl/orcl/trace/orcl_smon_3058.trc:
ORA-01595: error freeing extent (3) of rollback segment (7))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kdBlkCheckError], [3], [224], [38504], [], [], [], [], [], [], [], []
Starting background process CJQ0
Mon Sep 01 15:24:15 2014
CJQ0 started with pid=24, OS id=3105
Mon Sep 01 15:24:15 2014
Dumping diagnostic data in directory=[cdmp_20140901152415], requested by (instance=1, osid=3058 (SMON)), summary=[incident=19319].
Mon Sep 01 15:24:15 2014
Errors in file /oracle/diag/rdbms/orcl/orcl/trace/orcl_m000_3103.trc  (incident=19399):
ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
Incident details in: /oracle/diag/rdbms/orcl/orcl/incident/incdir_19399/orcl_m000_3103_i19399.trc
Use ADRCI or Support Workbench to package the incident.

PMON trace

Flush retried for xcb 0xbcedf500, pmd 0xbffcf1c8
kti: Reconstructing undo block 0xc002e2 for xcb 0xbcedf500
Doing block recovery for file 3 block 738
Block header before block recovery:
buffer tsn: 2 rdba: 0x00c002e2 (3/738)
scn: 0x0000.00292004 seq: 0x01 flg: 0x04 tail: 0x20040201
frmt: 0x02 chkval: 0xa0c7 type: 0x02=KTU UNDO BLOCK
Resuming block recovery (PMON) for file 3 block 738
Block recovery from logseq 123, block 58 to scn 2959315

*** 2014-09-01 15:24:26.856
Recovery of Online Redo Log: Thread 1 Group 3 Seq 123 Reading mem 0
Block recovery completed at rba 123.103.16, scn 0.2959317
==== Redo read statistics for thread 1 ====
Total physical reads (from disk and memory): 362Kb
-- Redo read_disk statistics --
Read rate (ASYNC): 0Kb in 0.01s => 0.00 Mb/sec
-- Redo read_memory statistics --
Read disk 0Kb and read memory 362Kb, hit-ratio=1.00
Longest record: 1Kb, moves: 0/71 (0%)
Longest LWN: 6Kb, moves: 0/17 (0%), moved: 0Mb
Last redo scn: 0x0000.002d27d3 (2959315)
-------------------------------------------------------
IMU redo block change list
------------------------------------------------------
tsn 1 rdba 0x814d80 bh 0xa8ff4cf0 cv 0xbcfb8070
------------------------------------------------------
KTB Redo
op: 0x11  ver: 0x01
compat bit: 4 (post-11) padding: 1
op: F  xid:  0x0007.00d.0000052f    uba: 0x00c002e2.02d5.17
Block cleanout record, scn:  0x0000.002d28d6 ver: 0x01 opt: 0x02, entries follow...
itli: 1  flg: 2  scn: 0x0000.002d28cd
itli: 2  flg: 2  scn: 0x0000.002d28d5
KDO Op code: DRP row dependencies Disabled
xtype: XA flags: 0x00000000  bdba: 0x00814d80  hdba: 0x00800eba
itli: 1  ispac: 0  maxfr: 4858
tabn: 0 slot: 28(0x1c)
------------------------------------------------------
tsn 2 rdba 0xc000e0 bh 0xaafa81a8 cv 0xbcfb8178
------------------------------------------------------
ktudh redo: slt: 0x000d sqn: 0x0000052f flg: 0x0012 siz: 304 fbi: 0
uba: 0x00c002e2.02d5.17    pxid:  0x0000.000.00000000
------------------------------------------------------
tsn 1 rdba 0x800ec5 bh 0xaafac378 cv 0xbcfb8248
------------------------------------------------------
index redo (kdxlde):  delete leaf row
KTB Redo
op: 0x01  ver: 0x01
compat bit: 4 (post-11) padding: 1

ORA-00600: internal error code, arguments: [4194], [], [], [], [], [], [], [], [], [], [], []
kjzduptcctx: Notifying DIAG for crash event
----- Abridged Call Stack Trace -----
ksedsts()+465<-kjzdssdmp()+267<-kjzduptcctx()+232<-kjzdicrshnfy()+63<-ksuitm()+5570<-ksbrdp()+3507<-opirip()+623<-opidrv()+603<-sou2o()+103<-opimai_real()+250<-ssthrdmain()+265<-main()+201<-__libc_start_main()+253
----- End of Abridged Call Stack Trace -----

SMON trace

Block Checking: DBA = 12583136, Block Type = System Managed Segment Header Block
ERROR: SMU Segment Header Corrupted.  Error Code = 38504
ktu4smck: SCN commited txn list is not sorted.
previous txn slot=25, scn=0x0000.00291da4
offending txn slot=20, scn=0x0000.00291517
TRN CTL:: seq: 0x02d5 chd: 0x0005 ctl: 0x0005 inc: 0x00000000 nfb: 0x0001
mgc: 0xb000 xts: 0x0068 flg: 0x0001 opt: 2147483646 (0x7ffffffe)
uba: 0x00c002e2.02d5.17 scn: 0x0000.00291c12
Version: 0x01
FREE BLOCK POOL::
uba: 0x00000000.02d5.16  ext: 0x2  spc: 0x1496
uba: 0x00c002e0.02d5.0b  ext: 0x2  spc: 0xb74
uba: 0x00000000.02ab.1d  ext: 0x7  spc: 0x8cc
uba: 0x00000000.0225.01  ext: 0x2  spc: 0x1f84
uba: 0x00000000.0000.00  ext: 0x0  spc: 0x0
TRN TBL::
index  state cflags  wrap#    uel         scn            dba            parent-xid    nub       bcl     cmt
-----------------------------------------------------------------------------------------
0x00    9    0x00  0x052d  0x0008  0x0000.00291641  0x00c00121  0x0000.000.00000000  0x00000001   0x00000000    1405993807
0x01    9    0x00  0x052a  0x0021  0x0000.00291b21  0x00c0012b  0x0000.000.00000000  0x00000001   0x00000000    1405985479
0x02    9    0x00  0x0511  0x0007  0x0000.00291d56  0x00c0012d  0x0000.000.00000000  0x00000003   0x00000000    1406001110
0x03    9    0x00  0x052e  0x000b  0x0000.0029187c  0x00c00127  0x0000.000.00000000  0x00000001   0x00000000    1405995007
...
0x0d   10    0x00  0x052f  0x0002  0x0000.002d27b1  0x00000000  0x0000.000.00000000  0x00000000   0x00000000    0
0x0e    9    0x00  0x052d  0xffdd  0x0000.002915f2  0x00c008de  0x0000.000.00000000  0x00000001   0x00000000    1405987108
0x0f    9    0x00  0x052e  0x0002  0x0000.0029175e  0x00c00125  0x0000.000.00000000  0x00000003   0x00000000    1405994408
EXT TRN CTL::
usn: 7
sp1:0x00000000 sp2:0x00000000 sp3:0x00000000 sp4:0x00000000
sp5:0x00000000 sp6:0x00000000 sp7:0x00000000 sp8:0x00000000

TYP:0 CLS:29 AFN:3 DBA:0x00c000e0 OBJ:4294967295 SCN:0x0000.00292161 SEQ:1 OP:5.2 ENC:0 RBL:0
ktudh redo: slt: 0x000d sqn: 0x0000052f flg: 0x0411 siz: 80 fbi: 0
uba: 0x00c002e2.02d5.17    pxid:  0x0000.000.00000000
...

Disk Block image:
buffer rdba: 0x00c000e0
scn: 0x0000.00292161 seq: 0x01 flg: 0x04 tail: 0x21612601
frmt: 0x02 chkval: 0xcc65 type: 0x26=KTU SMU HEADER BLOCK
...
SMON: following errors trapped and ignored:
ORA-01595: error freeing extent (3) of rollback segment (7))
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kdBlkCheckError], [3], [224], [38504], [], [], [], [], [], [], [], []

Tip:

Above trace file content had truncated.
CLS ==>The block Class Block, classes above 16 are reserved for undo segments. The block class is dependent on the undo segment number. Each undo segment has two block classes; one for the undo segment header and the other for undo segment blocks.  so 29 (7 undo segment’s Undo Header)
AFN ==>Absolute File Number
DBA:0x00c000e0 ==>datafile 3 block 224
OP ==> redo operation,  start of a transaction the redo operation is 5.2.

In the above example the class (CLS) is 29.  we can determine that this transaction is using undo segment number is 7. We can also see that the slot number (slt) is 0x000d  and the sequence number (sqn) is 0x0000052f .
For this transaction, the XID will be
XID: 0x0007.00d.0000052f

# The xid is 8 bytes composed of Undo segment number , Undo segment header transaction table slot , and sequence number wrap .
# The uba is 8 bytes composed of DBA of undo block , Sequence number , and Record number in block .

Transaction recovery (the process of rolling back transactions) is performed:

  1.     By the shadow process when a rollback statement is issued
  2.     By PMON when a session (process) crashes with a transaction in progress
  3.     By SMON or a shadow process on opening a database that crashed with active transactions

If the RDBMS instance crashes before the transaction is committed, there is no time to roll back the transaction. The next time the database is opened, crash recovery rolls the database forward, returning it to its pre-crash state. it is then the responsibility of transaction recovery to remove any incomplete transactions.

Transaction recover at database open

  1.     Active transactions in the SYSTEM rollback segment are immediately rolled back
  2.     Active transactions in other rollback segments are marked as “dead”
  3.     At a later time, SMON scans the segments again and performs a rollback on dead transactions

This is especially useful when downtime must be kept to an absolute minimum. A side effect of this behavior is that databases now open, in most cases, even if a rollback segment is corrupt, with errors loged to alert and trace file(SMON). This makes it much easier to diagnose the failure, because you have access to the database. However, it would be easy for customers to run for some time without realizing that they have a problem. even if the server is unable to read the rollback segment header, because the data file is offline or corrupted, then you cannot open the database. ALL rollback segments found in undo$, not just those specified with the rollback_segments parameter, are checked for active transactions when the database is opened.

we can also find dead transactions by dumping the rollback segment header and checking the state column of the transaction table dump. to dump the rollback segment header use the following command:

sql> alter system dump undo header '<rollback segment name>'

A active transaction is identified by having state=’10’

tack a example

#session 1
SQL> delete tt;
--no commit 
#session 2
sys@ANBOB>select xidusn,xidslot,start_scnb,status from v$transaction;
XIDUSN               XIDSLOT              START_SCNB           STATUS
-------------------- -------------------- -------------------- ----------------
2                    22                   242359942             ACTIVE

sys@ANBOB>select usn,extents,status,curblk from v$rollstat where XACTS>0;
USN                  EXTENTS               STATUS         CURBLK
-------------------- -------------------- --------------- --------------------
2                    3                     NLINE           2

sys@ANBOB>select * from v$rollname;
USN NAME
-------------------- ------------------------------
0 SYSTEM
1 _SYSSMU1_1240252155$
2 _SYSSMU2_111974964$
3 _SYSSMU3_4004931649$
4 _SYSSMU4_1126976075$
5 _SYSSMU5_4011504098$
6 _SYSSMU6_3654194381$
7 _SYSSMU7_4222772309$
8 _SYSSMU8_3612859353$
9 _SYSSMU9_3945653786$
10 _SYSSMU10_3271578125$

sys@ANBOB>alter system dump undo header '_SYSSMU2_111974964$';

sys@ANBOB>select * from v$diag_info;

trace file

======================================
Version: 0x01
FREE BLOCK POOL::
uba: 0x00000000.1bf4.16 ext: 0x1  spc: 0x1496
uba: 0x00000000.1bf4.02 ext: 0x1  spc: 0x1f06
uba: 0x00000000.1bf3.06 ext: 0x0  spc: 0x136a
uba: 0x00000000.19f6.42 ext: 0x2  spc: 0x9c2
uba: 0x00000000.081c.04 ext: 0x34 spc: 0x1dae
TRN TBL::
index  state cflags  wrap#    uel         scn            dba            parent-xid    nub     stmt_num    cmt
------------------------------------------------------------------------------------------------
0x00    9    0x00  0x610a  0x000a  0x0008.0e72174d  0x00000000  0x0000.000.00000000  0x00000000   0x00000000  1409648053
0x01    9    0x00  0x6112  0x001c  0x0008.0e7219a0  0x00c00096  0x0000.000.00000000  0x00000001   0x00000000  1409648434
0x02    9    0x00  0x6115  0x0018  0x0008.0e721ca1  0x00c00089  0x0000.000.00000000  0x00000001   0x00000000  1409649254
0x03    9    0x00  0x610d  0x0019  0x0008.0e72198c  0x00c00096  0x0000.000.00000000  0x00000001   0x00000000  1409648434
... had truncated
0x12    9    0x00  0x6110  0x001e  0x0008.0e721dec  0x00000000  0x0000.000.00000000  0x00000000   0x00000000  1409649794
0x13    9    0x00  0x6108  0x0020  0x0008.0e721b8c  0x00c00088  0x0000.000.00000000  0x00000001   0x00000000  1409649033
0x14    9    0x00  0x60fe  0x001b  0x0008.0e72190d  0x00c00095  0x0000.000.00000000  0x00000001   0x00000000  1409648433
0x15    9    0x00  0x610c  0x0009  0x0008.0e721ba5  0x00c00088  0x0000.000.00000000  0x00000001   0x00000000  1409649033
0x16   10    0x80  0x6108  0x0001  0x0008.0e721e86  0x00c0008a  0x0000.000.00000000  0x00000001   0x00000000  0
0x17    9    0x00  0x610b  0x0010  0x0008.0e72189c  0x00c00aed  0x0000.000.00000000  0x00000001   0x00000000  1409648413
0x18    9    0x00  0x60fb  0x0012  0x0008.0e721cac  0x00c00089  0x0000.000.00000000  0x00000001   0x00000000  1409649254

Using the _offline_rollback_segments or _corrupted_rollback_segments parameters changes the behavior of the RDBMS when:
 Opening the database
 Performing consistent read and delayed block cleanout
 Dropping a rollback segment

When opening a database, any rollback segments listed in _offline or _corrupted parameters:
Are not scanned, and any active transactions are neither marked as dead nor rolled back Appear offline in dba_rollback_segs (undo$) Cannot be acquired by the instance for new transactions

Solution:
In cases of UNDO log corruption, you must:

  1. Change the undo_management parameter from “AUTO” to “MANUAL”
  2. Drop the old UNDO tablespace
  3. Create a new UNDO tablespace
  4. Change the undo_management parameter from  “MANUAL” to “AUTO”

IF database can be open.  References How to Change the Existing Undo Tablespace to a New Undo Tablespace (文档 ID 431652.1)

IF database cannot be open, or little time can be open

1 – Identify the bad segment:

select  segment_name,status from  dba_rollback_segs
where   tablespace_name='xxx'
and   status = 'NEEDS RECOVERY';

or

#strings system01.dbf | grep _SYSSMU | cut -d $ -f 1 | sort -u

2. Bounce the instance with the hidden parameter “_offline_rollback_segments” or “_corrupted_rollback_segments” in init.ora (or using  spfile), specifying the bad segment name:

*.undo_management='MANUAL'
*.undo_tablespace='SYSTEM'
*._offline_rollback_segments=('_SYSSMU7_1394480367$','xxx')

Tip: in this case just need 7# rollback_segment.

Noremally you cannot drop a rollback segment if it contains active transactions . you can circumvent this by using the parameters, If you drop an _offline or _corrupted rollback segment that contains active transaction, you risk logical corruption, possibly in the data dictionary.

3. Bounce database, nuke the corrupt segment and tablespace:

startup pfile=''
drop rollback segment "_SYSSMU7_1394480367$";
drop tablespace UNDOTBS1 including contents and datafiles;

4. Create new undo tablespace and set default undo tablespace, restore pfile UNDO management to AUTO, restartup

5. export full database ,drop database and recreate database ,import (recommendation)

Always make sure to change your database back into a supported state by solving the problems, removing the special settings in you parameter file, shuting the instance down, and performing a normal startup. Although the database may seem to run smoothly, certain corruption problems can come back even after a long time, potentially causing a lot more problems than they did in the first place.

NOTE:
When using these undocumented parameters the transaction table is not read when the database is opened. so transactions are not marked as dead or rolled back. the database is in an unsupported state.

Undocumented Parameters: More Effects On CR and Cleanout

  1. If an open ITL is found to be associated with an _offline segment, the segment is read to find the transaction status
  2. If committed, the block is cleaned out
  3. If active and you want to read the block, a CR copy is constructed using undo from the segment
  4. If active and you want to lock the row,undesirable behavior may result

If an open ITL is found to be associated with a _corrupted segment, the segment is not read to find the transaction status
It is as if the rollback segment had been dropped; the transaction is assumed to be committed and delayed block cleanout is performed
If the transaction was not committed, logical corruption will occur

 

Reference MOS and  DSI

打赏

对不起,这篇文章暂时关闭评论。