Oracle数据库当遇到存储磁盘坏道时的处理(DBV-00102)
数据库环境有时会因为硬件磁盘问题导致数据不可读,而硬盘“坏道”便是这其中最常见的问题, 当出现因为磁盘坏道里更加棘手,无法移动或跳过,更甚至因为有坏盘在换盘后RAID重组出现文件系统勘误导致文件为0bytes,增加恢复难度,例如使用dbv 检查时会出现如下报错:
$ dbv file=/oracle/oradata/anbob_01.dbf DBVERIFY: Release 10.2.0.4.0 - Production on Mon Dec 28 15:41:24 2020 Copyright (c) 1982, 2007, Oracle. All rights reserved. DBVERIFY - Verification starting : FILE = /oracle/oradata/anbob_01.dbf DBV-00102: File I/O error on FILE (/oracle/oradata/anbob_01.dbf) during verification read operation (-2) $ cp /oracle/oradata/anbob_01.dbf /oracle/oradata/anbob_01.dbf_bak cp: anbob_01.dbf : I/O error
Note:
可见文件系统复制都是报错。
出现这类问题时如果数据库文件在文件系统,可以使用dd或其它工具重组数据文件,跳过坏道数据块,重建坏道数据库涉及的对象;如果数据文件在ASM中处理相对麻烦,也是可以脚本拼抽数据文件从ASM到文件系统,或AMDU, 如果单个数据文件比较到,坏的ASM 坏比较少,可以深度bbed修改ASM METADATA指定原AU到新空白AU 位置欺骗oracle,然后尽快重建数据库。
可以自己写脚本试探数据文件的坏道范围, 文件系统的dd 读坏道时
$ dd if=/oracle/oradata/anbob_01.dbf of=/dev/null bs=8192 iseek=142843 count=100000 read: I/O error 0+0 records in 0+0 records out $ dd if=/oracle/oradata/anbob_01.dbf of=/dev/null bs=8192 iseek=142933 count=100000 100000+0 records in 100000+0 records out
从dba_extents 确认范围内数据库对象, 然后dd 组装新数据文件,rename datafile. recover datafile有可能会出现
SQL> SQL> recover datafile 36; ORA-00283: recovery session canceled due to errors ORA-12801: error signaled in parallel query server P004 ORA-00600: internal error code, arguments: [3020], [36], [142864], [151137808], [], [], [], [] ORA-10567: Redo is inconsistent with data block SQL> recover datafile 36 allow 1 corruption; Media recovery complete. SQL> alter database datafile 36 online; Database altered.
当然如果现在validate datafile 数据文件中跳过的block 会查出”ALL ZERO”的勘误块, 也可以不用管后面数据库会reuse时重新reformat block, 但是当前rman默认备份可能会因发现corruptd block而中断,需要特殊配置。 也可以手动造数据fill datafile 尝试reformat block. https://www.anbob.com/archives/2573.html 有相关脚本。
SQL> select * from v$database_block_corruption; FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO ---------- ---------- ---------- ------------------ --------- 36 142864 1 1.2845E+10 LOGICAL 36 142832 32 0 ALL ZERO 36 142865 58 0 ALL ZERO
如果坏道发生成ASM disk上,确认位置要麻烦一些, 需要从X$KFFXP VIEW确认决对位置,X$KFFXP (metadata, file extent pointers),字段含义
GROUP_KFFXP diskgroup number (1 - 63) ASM disk group number. Join with v$asm_disk and v$asm_diskgroup NUMBER_KFFXP file number for the extent ASM file number. Join with v$asm_file and v$asm_alias COMPOUND_KFFXP (group_kffxp << 24) + file # File identifier. Join with compound_index in v$asm_file INCARN_KFFXP file incarnation number File incarnation id. Join with incarnation in v$asm_file PXN_KFFXP physical extent number Extent number per file XNUM_KFFXP extent number bit 31 set if indirect Logical extent number per file (mirrored extents have the same value) LXN_KFFXP logical extent number 0,1 used to identify primary/mirror extent,2 identifies file header allocation unit (hypothesis) used in the query such that we go after only the primary extents, not secondary extents DISK_KFFXP disk on which AU is located Disk number where the extent is allocated.Join with v$asm_disk Relative position of the allocation unit from the beginning of the disk. AU_KFFXP AU number on disk of AU allocation unit size (1 MB) in v$asm_diskgroup CHK_KFFXP unknown 可能是范围为[0-256]的某种校验值 SIZE_KFFXP size_kffxp is used such that we account for variable sized extents. sum(size_kffxp) provides the number of AUs that are on that disk. # 查询映射关系 set linesize 140 pagesize 1400 col "FILE NAME" format a40 set head on select NAME "FILE NAME", NUMBER_KFFXP "FILE NUMBER", XNUM_KFFXP "EXTENT NUMBER", DISK_KFFXP "DISK NUMBER", AU_KFFXP "AU NUMBER", SIZE_KFFXP "NUMBER of AUs" from x$kffxp, v$asm_alias where GROUP_KFFXP = GROUP_NUMBER and NUMBER_KFFXP = FILE_NUMBER and system_created = 'Y' and lxn_kffxp = 0 order by name;
12c开始还可以使用asmcmd 的mapau和mapextent查看映射关系。 更多内容可以查看ASM INTERNAL,
对于有坏道的数据文件可以搞到文件系统,组装后再搞回ASM, 也可以只修改ASM metadata指针到一个其它AU,欺骗oracle, 但是RB后可能还有有问题,因此需要及时重建数据库。
定位可以参考 如何在asm上定位数据块
select INCARN_KFFXP,XNUM_KFFXP,LXN_KFFXP,DISK_KFFXP,AU_KFFXP,'dd if='||b.path||' of=/tmp/file'||NUMBER_KFFXP||'.dbf'||' conv=notrunc bs=1048576 skip='||AU_KFFXP||' seek='||XNUM_KFFXP||' count=1' from x$kffxp a, v$asm_disk b wHere inst_id=1 and GROUP_KFFXP=5 AND NUMBER_KFFXP=256 and XNUM_KFFXP=(ceil((8192*1389)/1048576)-1) --start 0 and a.GROUP_KFFXP=b.GROUP_NUMBER and a.DISK_KFFXP=b.DISK_NUMBER and LXN_KFFXP=0 order by XNUM_KFFXP,LXN_KFFXP; INCARN_KFFXP XNUM_KFFXP LXN_KFFXP DISK_KFFXP AU_KFFXP 'DDIF='||B.PATH||'OF=/TMP/FILE'||NUMBER_KFFXP||'.DBF'||'CONV=NOTRUNCBS=1048576SKIP='||AU_KFFXP||'SEEK='||XNUM_KFFXP||'COUNT=1' ------------ ---------- ---------- ---------- ---------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1057656433 10 0 1 24 dd if=/dev/asm-diski of=/tmp/file256.dbf conv=notrunc bs=1048576 skip=24 seek=10 count=1
查找file 1 block 1(FileDirectory blk 1 AU num )
$ kfed read /dev/asm-diskh|grep -E "f1b1|ausize" kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000 kfdhdb.f1b1locn: 2 ; 0x0d4: 0x00000002 #非0值 $ kfed read /dev/asm-diskh aun=2 blkn=1|head -10 kfbh.endian: 1 ; 0x000: 0x01 kfbh.hard: 130 ; 0x001: 0x82 kfbh.type: 4 ; 0x002: KFBTYP_FILEDIR kfbh.datfmt: 1 ; 0x003: 0x01 kfbh.block.blk: 1 ; 0x004: blk=1 kfbh.block.obj: 1 ; 0x008: file=1 kfbh.check: 4176897006 ; 0x00c: 0xf8f663ee kfbh.fcn.base: 261 ; 0x010: 0x00000105 kfbh.fcn.wrap: 0 ; 0x014: 0x00000000 kfbh.spare1: 0 ; 0x018: 0x00000000
根据file Directory查找disk Directory
$ kfed read /dev/asm-diskh aun=2 blkn=1|grep kfffde|head -20 kfffde[0].xptr.au: 2 ; 0x4a0: 0x00000002 kfffde[0].xptr.disk: 0 ; 0x4a4: 0x0000 kfffde[0].xptr.flags: 0 ; 0x4a6: L=0 E=0 D=0 S=0 kfffde[0].xptr.chk: 40 ; 0x4a7: 0x28 kfffde[1].xptr.au: 20 ; 0x4a8: 0x00000014 # FILE DIR AU 1 AT DISK 1 AU 20 kfffde[1].xptr.disk: 1 ; 0x4ac: 0x0001 # disk 1 kfffde[1].xptr.flags: 0 ; 0x4ae: L=0 E=0 D=0 S=0 kfffde[1].xptr.chk: 63 ; 0x4af: 0x3f kfffde[2].xptr.au: 4294967295 ; 0x4b0: 0xffffffff # 保留值 kfffde[2].xptr.disk: 65535 ; 0x4b4: 0xffff kfffde[2].xptr.flags: 0 ; 0x4b6: L=0 E=0 D=0 S=0
这个文件上只有1 个file directory, 0 file directory 是留给ASM METADATA。
# kfed read /dev/asm-diski aun=20 blkn=0|head -10
kfbh.endian: 1 ; 0x000: 0x01
kfbh.hard: 130 ; 0x001: 0x82
kfbh.type: 4 ; 0x002: KFBTYP_FILEDIR
kfbh.datfmt: 1 ; 0x003: 0x01
kfbh.block.blk: 256 ; 0x004: blk=256 # asm file 256
kfbh.block.obj: 1 ; 0x008: file=1
kfbh.check: 3175892269 ; 0x00c: 0xbd4c452d
kfbh.fcn.base: 540 ; 0x010: 0x0000021c
kfbh.fcn.wrap: 0 ; 0x014: 0x00000000
kfbh.spare1: 0 ; 0x018: 0x00000000
然后kfed merge修改AU 指针,启动ASM就可以。
对不起,这篇文章暂时关闭评论。