ora-600 [ktbdchk1: bad dscn] and ora-8103 corrupted block
前两天一同事的数据库alert log不停的在刷出ora-600 [ktbdchk1: bad dscn]错误,影响的是一张table上的insert语句, 前不久该数据库存储出现过故障,从已知BUG中没有找到相似案例,环境11.2.0.4 2nodes RAC,未配置Data Guard, 这里只是简单的记录一下问题的处理过程。
# ALERT Log
Tue Jul 26 19:28:53 2016 Thread 1 advanced to log sequence 83177 (LGWR switch) Current log# 1 seq# 83177 mem# 0: +DATA/anbob/onlinelog/group_1.261.839353823 Current log# 1 seq# 83177 mem# 1: +FRA/anbob/onlinelog/group_1.257.839353823 Tue Jul 26 19:28:53 2016 Archived Log entry 318062 added for thread 1 sequence 83176 ID 0xded21be5 dest 1: Tue Jul 26 19:30:58 2016 Errors in file /home/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_ora_7655.trc (incident=11956963): ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], [], [], [], [], [] Incident details in: /home/oracle/diag/rdbms/anbob/anbob1/incident/incdir_11956963/anbob1_ora_7655_i11956963.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. Tue Jul 26 19:31:00 2016 Dumping diagnostic data in directory=[cdmp_20160726193100], requested by (instance=1, osid=7655), summary=[incident=11956963]. Tue Jul 26 19:31:03 2016 Sweep [inc][11956963]: completed Sweep [inc2][11956963]: completed Tue Jul 26 19:34:00 2016 Thread 1 advanced to log sequence 83178 (LGWR switch) Current log# 2 seq# 83178 mem# 0: +DATA/anbob/onlinelog/group_2.262.839353823 Current log# 2 seq# 83178 mem# 1: +FRA/anbob/onlinelog/group_2.258.839353823 Tue Jul 26 19:34:00 2016
# TRACE FILE
Dump file /home/oracle/diag/rdbms/anbob/anbob1/incident/incdir_11956963/anbob1_ora_7655_i11956963.trc Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP, Data Mining and Real Application Testing options ORACLE_HOME = /home/app/oracle/product/11.2.0 System name: Linux Node name: 4gdb1 Release: 2.6.32-279.el6.x86_64 Version: #1 SMP Wed Jun 13 18:24:36 EDT 2012 Machine: x86_64 Instance name: anbob1 Redo thread mounted by this instance: 1 Oracle process number: 134 Unix process pid: 7655, image: oracle@4gdb1 *** 2016-07-26 19:30:58.981 *** SESSION ID:(3010.12441) 2016-07-26 19:30:58.981 *** CLIENT ID:() 2016-07-26 19:30:58.981 *** SERVICE NAME:(anbob) 2016-07-26 19:30:58.981 *** MODULE NAME:() 2016-07-26 19:30:58.981 *** ACTION NAME:() 2016-07-26 19:30:58.981 Dump continued from file: /home/oracle/diag/rdbms/anbob/anbob1/trace/anbob1_ora_7655.trc ORA-00600: internal error code, arguments: [ktbdchk1: bad dscn], [], [], [], [], [], [], [], [], [], [], [] ========= Dump for incident 11956963 (ORA 600 [ktbdchk1: bad dscn]) ======== ----- Beginning of Customized Incident Dump(s) ----- [ktbdchk] -- ktbgcl4 -- bad dscn dependent scn: 0x691a.3c003e00 recent scn: 0x0001.7fa36b1a current scn: 0x0001.7fa36b1a ----- End of Customized Incident Dump(s) ----- *** 2016-07-26 19:30:59.060 dbkedDefDump(): Starting incident default dumps (flags=0x2, level=3, mask=0x0) ----- Current SQL Statement for this session (sql_id=9dws8axhn8m32) ----- insert into ANBOBT1(cust_ability_id,accnbr,ability_code,action_type,start_time,end_time,groupId,create_time,province_code,city_code)values (ANBOBT1_seq.nextval,:1,:2,:3,:4,:5,:6,:7,:8,:9) ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- ... dbgexPhaseII()+1764 call ksfdmp() 000000003 ? 000000002 ? 7FFFAED67190 ? 7FFFAED67268 ? 7FFFAED6BD10 ? 000000002 ? dbgexExplicitEndInc call dbgexPhaseII() 7FD46BC75730 ? 7FD46B720EE0 ? ()+755 7FFFAED6FFA8 ? 7FFFAED67268 ? 7FFFAED6BD10 ? 000000002 ? dbgeEndDDEInvocatio call dbgexExplicitEndInc 7FD46BC75730 ? 7FD46B720EE0 ? nImpl()+769 () 7FFFAED6FFA8 ? 7FFFAED67268 ? 7FFFAED6BD10 ? 000000002 ? dbgeEndDDEInvocatio call dbgeEndDDEInvocatio 7FD46BC75730 ? 7FD46B720EE0 ? n()+52 nImpl() 7FFFAED6FFA8 ? 7FFFAED67268 ? 7FFFAED6BD10 ? 000000002 ? ktbValidateDependen call dbgeEndDDEInvocatio 7FD46BC75730 ? 7FD46B720EE0 ? tScn()+432 n() 7FFFAED6FFA8 ? 7FFFAED67268 ? 7FFFAED6BD10 ? 000000002 ? ktbgcl1()+2030 call ktbValidateDependen 7FFFAED74DF8 ? 009E69A1C ? tScn() 7FFFAED6FFA8 ? 7FFFAED67268 ? 7FFFAED6BD10 ? 000000002 ? ktbcfs()+92 call ktbgcl1() 7FFFAED74D30 ? 000000002 ? 7FFFAED70C2C ? 000000002 ? 691A00000000 ? 000000002 ? kdt_bseg_srch_cbk() call ktbcfs() 7FFFAED74D30 ? 7FFFAED7108C ? +1769 000000000 ? 000000002 ? 691A00000000 ? 000000002 ? ktspfpblk()+527 call kdt_bseg_srch_cbk() 7FFFAED74D30 ? 7FFFAED7108C ? 000000000 ? 000000002 ? 691A00000000 ? 000000002 ? ktspfsrch()+503 call ktspfpblk() 7FFFAED71460 ? 7FFFAED7108C ? 000000000 ? 000000002 ? 691A00000000 ? 000000002 ? ktspscan_bmb()+509 call ktspfsrch() 7FFFAED71460 ? 7FFFAED7108C ? 000000000 ? 000000002 ? 691A00000000 ? 000000002 ? ktspgsp_main()+856 call ktspscan_bmb() 057974305 ? 7FFFAED7108C ? 000000000 ? 000000002 ? 691A00000000 ? 000000002 ? kdtgsp()+2701 call ktspgsp_main() 057974305 ? 7FFFAED7108C ? ... ... GLOBAL CACHE ELEMENT DUMP (address: 0x9c7e35a10): id1: 0x1ada9 id2: 0xd pkey: OBJ#87760 block: (13/109993) lock: X rls: 0x0 acq: 0x0 latch: 20 flags: 0x20 fair: 0 recovery: 0 fpin: 'ktspbwh1: ktspfsrch' bscn: 0x0.129cc03a bctx: (nil) write: 0 scan: 0x0 lcp: (nil) lnk: [NULL] lch: [0x9af615b50,0x9af615b50] seq: 438 hist: 67 143:0 208 352 197 48 121 67 143:0 208 352 197 48 121 LIST OF BUFFERS LINKED TO THIS GLOBAL CACHE ELEMENT: flg: 0x00000000 state: XCURRENT tsn: 10 tsh: 0 mode: EXCL pin: 'ktspbwh1: ktspfsrch' addr: 0x9af615a18 obj: 87760 cls: DATA bscn: 0x0.129cc03a buffer tsn: 10 rdba: 0x0341ada9 (13/109993) scn: 0x0000.129cc03a seq: 0x01 flg: 0x04 tail: 0xc03a2801 frmt: 0x02 chkval: 0x52bd type: 0x28=PAGETABLE MANAGED LOB BLOCK <<<<<<<<<< Hex dump of block: st=0, typ_found=1 ... ... ... Long field block dump: Object Id 87600 <<<<<<<<<<<<<<<<<<< LobId: 0001000131A69 PageNo 2 <<<<<<<<<<<<<<<<<<< Version: 0x0000.00000002 pdba: 54634113 00 65 00 3e 75 28 62 37 59 d3 54 0d 00 3c 00 2f 00 50 00 61 00 72 00 61 00 4e 00 61 00 6d 00 65 00 3e 00 3c 00 4e 00 65 00 77 00 50 00 61 00 72 ...
From MOS
Format: ORA-600 [ktbdchk1: bad dscn] [a] [b] [c] [d] [e]
SUGGESTIONS:
Check all machine hardware for possible errors.
Run dbverify (dbv) utility on the datafiles in question.
Run a full export on the database and check for corruption.
If possible rebuild any corrupted objects with export/drop/importNB Prob Bug Fixed Description
* III 22241601 12.2.0.0 ORA-600 [kdsgrp1] ORA-1555 / ORA-600 [ktbdchk1: bad dscn] due to Invalid Commit SCN in INDEX block
*D IIII 8895202 11.2.0.2, 12.1.0.1 ORA-1555 / ORA-600 [ktbdchk1: bad dscn] ORA-600 [2663] in Physical Standby after switchover – superseded
II 9674932 Same fix as 8895202 for ORA-600 [ktbdchk1: bad dscn] – use 8895202 instead
E III 7517208 10.2.0.5, 11.2.0.1 DBV enhanced to identify Logical SCN Block corruptions
* III 5380055 10.2.0.3, 11.1.0.6 ORA-1555 / corruption if switch to Standby then back to Primary
– 3610343 9.2.0.7, 10.1.0.4, 10.2.0.1 OERI:[ktbdchk1: bad dscn] in RAC
“
查看当时的建表DDL
CREATE TABLE "WEEJAR"."ANBOBT1" ( "CUST_ABILITY_ID" NUMBER(12,0) NOT NULL ENABLE, "ACCNBR" VARCHAR2(32) NOT NULL ENABLE, "ABILITY_CODE" VARCHAR2(32) NOT NULL ENABLE, "ACTION_TYPE" VARCHAR2(32) NOT NULL ENABLE, "START_TIME" DATE NOT NULL ENABLE, "END_TIME" DATE NOT NULL ENABLE, "GROUPID" NUMBER(9,0), "CREATE_TIME" DATE, "PROVINCE_CODE" VARCHAR2(10), "CITY_CODE" VARCHAR2(10) ) SEGMENT CREATION IMMEDIATE PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING STORAGE(INITIAL 131072 NEXT 8192 MINEXTENTS 1 MAXEXTENTS 2147483645 PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT);
因为不能远程连接,确认问题要过中间两个人,所以没有记录一些具体信息,且数据库还一直OPEN,只是影响了部分(前几天数据库存储坏过)。
这里有个疑问的是为什么表上无lob字段, trace 中提示有lob type? 是oracle写异常? 如果你知道请mail告诉我。
分析方法:
1, 查找TRACE 中的OBJ# ,确认对象就是insert 所使用的表
2, 确认了该表上无索引,且无LOB 字段
3, flush buffer cache
4, CTAS 失败 报ora-8103
5, ANALYZE TABLE [TABLENAME] STRUCTURE CASCADE online; 失败 报ora-8103
所以最后按照解决 表 ora-8103的方法就可以避免这个ora-600, 这个案例确认该表不重要可以丢失部分数据。
1,找到corrupted block, 可以使用8103 trace 进一步跟踪也可以先解决ora-600 trace中的数据块先, dbv(rman) verify该数据文件再查询v$database_block_corruption
2,如果有unique + not null索引 ,那样当然可以使用MOS 上提供的脚本,根据rowid跳过异常的行恢复
3,如果没有索引,可以根据dba+obj# 使用DBMS_ROWID.ROWID_CREATE 生成大概的rowid(不知行数),也可以使用rowid跳过
4,如果了解bbed, 也可以尝试修改ktbbhcsc 里的SCN为略小于当前的SCN
5, 利用抽取类工具
6, 如果该表数据目前不重要,新建表,rename互换,后期手动追加数据
对不起,这篇文章暂时关闭评论。