首页 » Cloud, ORACLE 9i-23ai » Oracle数据库当遇到存储磁盘坏道时的处理(DBV-00102)

Oracle数据库当遇到存储磁盘坏道时的处理(DBV-00102)

数据库环境有时会因为硬件磁盘问题导致数据不可读,而硬盘坏道”便是这其中最常见的问题, 当出现因为磁盘坏道里更加棘手,无法移动或跳过,更甚至因为有坏盘在换盘后RAID重组出现文件系统勘误导致文件为0bytes,增加恢复难度,例如使用dbv 检查时会出现如下报错:

$ dbv file=/oracle/oradata/anbob_01.dbf

DBVERIFY: Release 10.2.0.4.0 - Production on Mon Dec 28 15:41:24 2020
Copyright (c) 1982, 2007, Oracle.  All rights reserved.
DBVERIFY - Verification starting : FILE = /oracle/oradata/anbob_01.dbf
DBV-00102: File I/O error on FILE (/oracle/oradata/anbob_01.dbf) during verification read operation (-2)

$ cp /oracle/oradata/anbob_01.dbf  /oracle/oradata/anbob_01.dbf_bak
cp: anbob_01.dbf : I/O error

Note:
可见文件系统复制都是报错。

出现这类问题时如果数据库文件在文件系统,可以使用dd或其它工具重组数据文件,跳过坏道数据块,重建坏道数据库涉及的对象;如果数据文件在ASM中处理相对麻烦,也是可以脚本拼抽数据文件从ASM到文件系统,或AMDU, 如果单个数据文件比较到,坏的ASM 坏比较少,可以深度bbed修改ASM METADATA指定原AU到新空白AU 位置欺骗oracle,然后尽快重建数据库。

可以自己写脚本试探数据文件的坏道范围, 文件系统的dd 读坏道时

$ dd if=/oracle/oradata/anbob_01.dbf of=/dev/null bs=8192 iseek=142843 count=100000
read: I/O error
0+0 records in
0+0 records out

$ dd if=/oracle/oradata/anbob_01.dbf of=/dev/null bs=8192 iseek=142933 count=100000
100000+0 records in
100000+0 records out

从dba_extents 确认范围内数据库对象, 然后dd 组装新数据文件,rename datafile. recover datafile有可能会出现

SQL> SQL> recover datafile 36;
ORA-00283: recovery session canceled due to errors
ORA-12801: error signaled in parallel query server P004
ORA-00600: internal error code, arguments: [3020], [36], [142864], [151137808],
[], [], [], []
ORA-10567: Redo is inconsistent with data block 

SQL> recover datafile 36 allow 1 corruption;
Media recovery complete.

SQL> alter database datafile 36 online;
Database altered.

当然如果现在validate datafile 数据文件中跳过的block 会查出”ALL ZERO”的勘误块, 也可以不用管后面数据库会reuse时重新reformat block, 但是当前rman默认备份可能会因发现corruptd block而中断,需要特殊配置。 也可以手动造数据fill datafile 尝试reformat block. https://www.anbob.com/archives/2573.html 有相关脚本。

SQL> select * from v$database_block_corruption;

     FILE#     BLOCK#     BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
        36     142864          1         1.2845E+10 LOGICAL
        36     142832         32                  0 ALL ZERO
        36     142865         58                  0 ALL ZERO

如果坏道发生成ASM disk上,确认位置要麻烦一些, 需要从X$KFFXP VIEW确认决对位置,X$KFFXP (metadata, file extent pointers),字段含义

GROUP_KFFXP       diskgroup number (1 - 63) ASM disk group number. Join with v$asm_disk and v$asm_diskgroup
NUMBER_KFFXP      file number for the extent ASM file number. Join with v$asm_file and v$asm_alias
COMPOUND_KFFXP    (group_kffxp << 24) + file # File identifier. Join with compound_index in v$asm_file
INCARN_KFFXP      file incarnation number File incarnation id. Join with incarnation in v$asm_file
PXN_KFFXP         physical extent number  Extent number per file
XNUM_KFFXP        extent number bit 31 set if indirect Logical extent number per file (mirrored extents have the same value)
LXN_KFFXP         logical extent number 0,1 used to identify primary/mirror extent,2 identifies file header allocation unit (hypothesis) used in the query such that we go after only the primary extents, not secondary extents 
DISK_KFFXP        disk on which AU is located  Disk number where the extent is allocated.Join with v$asm_disk Relative position of the allocation unit from the beginning of the disk. 
AU_KFFXP          AU number on disk of AU allocation unit size (1 MB) in v$asm_diskgroup
CHK_KFFXP         unknown 可能是范围为[0-256]的某种校验值
SIZE_KFFXP        size_kffxp is used such that we account for variable sized extents. sum(size_kffxp) provides the number of AUs that are on that disk.

# 查询映射关系

set linesize 140 pagesize 1400
col "FILE NAME" format a40
set head on
select NAME         "FILE NAME",
       NUMBER_KFFXP "FILE NUMBER",
       XNUM_KFFXP   "EXTENT NUMBER",
       DISK_KFFXP   "DISK NUMBER",
       AU_KFFXP     "AU NUMBER",
       SIZE_KFFXP   "NUMBER of AUs"
  from x$kffxp, v$asm_alias
 where GROUP_KFFXP = GROUP_NUMBER
   and NUMBER_KFFXP = FILE_NUMBER
   and system_created = 'Y'
   and lxn_kffxp = 0
 order by name;


12c开始还可以使用asmcmd 的mapau和mapextent查看映射关系。 更多内容可以查看ASM INTERNAL,

对于有坏道的数据文件可以搞到文件系统,组装后再搞回ASM, 也可以只修改ASM metadata指针到一个其它AU,欺骗oracle, 但是RB后可能还有有问题,因此需要及时重建数据库。

定位可以参考 如何在asm上定位数据块

select INCARN_KFFXP,XNUM_KFFXP,LXN_KFFXP,DISK_KFFXP,AU_KFFXP,'dd if='||b.path||' of=/tmp/file'||NUMBER_KFFXP||'.dbf'||' conv=notrunc bs=1048576 skip='||AU_KFFXP||' seek='||XNUM_KFFXP||' count=1' 
from x$kffxp a,
v$asm_disk b  wHere inst_id=1 
and GROUP_KFFXP=5 
AND NUMBER_KFFXP=256  
and XNUM_KFFXP=(ceil((8192*1389)/1048576)-1) --start 0
and a.GROUP_KFFXP=b.GROUP_NUMBER and a.DISK_KFFXP=b.DISK_NUMBER and LXN_KFFXP=0
order by XNUM_KFFXP,LXN_KFFXP;

INCARN_KFFXP XNUM_KFFXP  LXN_KFFXP DISK_KFFXP   AU_KFFXP 'DDIF='||B.PATH||'OF=/TMP/FILE'||NUMBER_KFFXP||'.DBF'||'CONV=NOTRUNCBS=1048576SKIP='||AU_KFFXP||'SEEK='||XNUM_KFFXP||'COUNT=1'
------------ ---------- ---------- ---------- ---------- -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  1057656433         10          0          1         24 dd if=/dev/asm-diski of=/tmp/file256.dbf conv=notrunc bs=1048576 skip=24 seek=10 count=1
  

查找file 1 block 1(FileDirectory blk 1 AU num )

$ kfed read /dev/asm-diskh|grep -E "f1b1|ausize"
kfdhdb.ausize:                  1048576 ; 0x0bc: 0x00100000
kfdhdb.f1b1locn:                      2 ; 0x0d4: 0x00000002 #非0值

$ kfed read /dev/asm-diskh aun=2 blkn=1|head -10
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                       1 ; 0x004: blk=1
kfbh.block.obj:                       1 ; 0x008: file=1
kfbh.check:                  4176897006 ; 0x00c: 0xf8f663ee
kfbh.fcn.base:                      261 ; 0x010: 0x00000105
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000

根据file Directory查找disk Directory

$  kfed read /dev/asm-diskh aun=2 blkn=1|grep kfffde|head -20
kfffde[0].xptr.au:                    2 ; 0x4a0: 0x00000002
kfffde[0].xptr.disk:                  0 ; 0x4a4: 0x0000
kfffde[0].xptr.flags:                 0 ; 0x4a6: L=0 E=0 D=0 S=0
kfffde[0].xptr.chk:                  40 ; 0x4a7: 0x28
kfffde[1].xptr.au:                   20 ; 0x4a8: 0x00000014   # FILE DIR AU 1 AT DISK 1 AU 20
kfffde[1].xptr.disk:                  1 ; 0x4ac: 0x0001       # disk 1
kfffde[1].xptr.flags:                 0 ; 0x4ae: L=0 E=0 D=0 S=0
kfffde[1].xptr.chk:                  63 ; 0x4af: 0x3f
kfffde[2].xptr.au:           4294967295 ; 0x4b0: 0xffffffff   # 保留值
kfffde[2].xptr.disk:              65535 ; 0x4b4: 0xffff
kfffde[2].xptr.flags:                 0 ; 0x4b6: L=0 E=0 D=0 S=0

这个文件上只有1 个file directory, 0 file directory 是留给ASM METADATA。

# kfed read /dev/asm-diski aun=20 blkn=0|head -10
kfbh.endian:                          1 ; 0x000: 0x01
kfbh.hard:                          130 ; 0x001: 0x82
kfbh.type:                            4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt:                          1 ; 0x003: 0x01
kfbh.block.blk:                     256 ; 0x004: blk=256  # asm file  256
kfbh.block.obj:                       1 ; 0x008: file=1
kfbh.check:                  3175892269 ; 0x00c: 0xbd4c452d
kfbh.fcn.base:                      540 ; 0x010: 0x0000021c
kfbh.fcn.wrap:                        0 ; 0x014: 0x00000000
kfbh.spare1:                          0 ; 0x018: 0x00000000

然后kfed merge修改AU 指针,启动ASM就可以。

打赏

,

对不起,这篇文章暂时关闭评论。