DBV not always correct, as in an extreme case the use of raw device
前几天有人反映数据库里用dbv 检查时发现了很多坏块,但是RMAN检查确没有问题,略深的研究了一下dbv这个工具,这里简单的记录一下dbv的检测方法,下篇会分享那个案例.这里我会记录一个极端情况下的dbv存在的问题,环境 11.2.0.3 2NODES RAC on aix 6.1, 存储是RAW DEVICE.
TABLESPACE: TEST BLOCK SIZE: 32768 NUMBER OF DATAFILES: 1 DATAFILES : /dev/rjf_lv16_412 FILE_ID:819 SQL> create table t tablespace test as select * from dba_objects; Table created. SQL> ALTER TABLE t allocate extent(datafile '/dev/rjf_lv16_412' SIZE 1g); Table altered. SQL> select blocks from v$datafile where file#=819; BLOCKS ---------- 552959 SQL> host oracle@kdjf1:/home/oracle>dbv file=/dev/rjf_lv16_412 blocksize=32768 DBVERIFY: Release 11.2.0.3.0 - Production on Wed Sep 7 11:01:26 2016 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. DBVERIFY - Verification starting : FILE = /dev/rjf_lv16_412 DBVERIFY - Verification complete Total Pages Examined : 32768 Total Pages Processed (Data) : 325 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 0 Total Pages Failing (Index): 0 Total Pages Processed (Other): 31 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 32412 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Total Pages Encrypted : 0 Highest block SCN : 2828425018 (3482.2828425018) oracle@kdjf1:/home/oracle>dbfsize /dev/rjf_lv16_412 Database file: /dev/rjf_lv16_412 Database file type: raw device without 4K starting offset Database file size: 32768 32768 byte blocks SQL> ALTER TABLE t allocate extent(datafile '/dev/rjf_lv16_412' SIZE 1g); Table altered. oracle@kdjf1:/home/oracle>dbfsize /dev/rjf_lv16_412 Database file: /dev/rjf_lv16_412 Database file type: raw device without 4K starting offset Database file size: 32768 32768 byte blocks oracle@kdjf1:/home/oracle>dbv file=/dev/rjf_lv16_412 blocksize=32768 DBVERIFY: Release 11.2.0.3.0 - Production on Wed Sep 7 11:03:25 2016 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. DBVERIFY - Verification starting : FILE = /dev/rjf_lv16_412 DBVERIFY - Verification complete Total Pages Examined : 32768 Total Pages Processed (Data) : 325 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 0 Total Pages Failing (Index): 0 Total Pages Processed (Other): 31 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 32412 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Total Pages Encrypted : 0 Highest block SCN : 2828425018 (3482.2828425018) oracle@kdjf1:/home/oracle>rman target / Recovery Manager: Release 11.2.0.3.0 - Production on Wed Sep 7 11:04:20 2016 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. connected to target database: BILL (DBID=1844724644) RMAN> backup validate datafile 819; channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25 List of Datafiles ================= File Status Marked Corrupt Empty Blocks Blocks Examined High SCN ---- ------ -------------- ------------ --------------- ---------- 819 OK 0 552418 552959 14957906826173 File Name: /dev/rjf_lv16_412 Block Type Blocks Failing Blocks Processed ---------- -------------- ---------------- Data 0 325 Index 0 0 Other 0 216
TIP:
可以确认表空间大小552959个blocks, dbv只扫描了32768个block(1GB),表空间上我手动扩展了表的分区为2个GB,DBV仍然也是未认到,但是RMAN扫描没有问题,是数据文件的总块数552959. 可能你会认为表上现在没数据,下面我们用实际insert些数据来覆盖掉这2G的区看看DBV会不会认到。
SQL> insert into t select * from t;
79807 rows created.
SQL> /
...
SQL> select bytes from dba_segments where segment_name='TANBOB';
BYTES
----------
2165309440
SQL> select sum(bytes)/1024/1024 from dba_segments where segment_name='T';
SUM(BYTES)/1024/1024
--------------------
2065
SQL> @gts t
Gather Table Statistics for table t...
PL/SQL procedure successfully completed.
SQL> alter table t rename to tanbob;
Table altered.
SQL> @tab tanbob
Show tables matching condition "%tanbob%" (if schema is not specified then current user tables only are shown)...
OWNER TABLE_NAME TYPE NUM_ROWS BLOCKS EMPTY AVGSPC ROWLEN TAB_LAST_ANALYZED DEGREE COMPRESS
-------- ------------ ---- ------------ ------------- --------- ------ ------ ----------------- ---------- --------
SYS TANBOB TAB 5107648 20926 0 0 114 20160907 11:16:18 1 DISABLED
TIP:
根据每行字节数114大概的估算一下覆盖所有2G的数据块需要多少行。继续做几次原表insert直到segment分配新extent.
SQL> insert into tanbob select * from tanbob; 10215296 rows created. SQL> commit; Commit complete. SQL> select bytes,blocks from dba_segments where segment_name='TANBOB'; BYTES BLOCKS ---------- ---------- 2769289216 84512 SQL> select name from v$datafile where file#=819; NAME ------------------------------ /dev/rjf_lv16_412 oracle@kdjf1:/home/oracle>dbv file=/dev/rjf_lv16_412 blocksize=32768 Total Pages Examined : 32768 Total Pages Processed (Data) : 32584 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 0 Total Pages Failing (Index): 0 Total Pages Processed (Other): 184 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 0 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Total Pages Encrypted : 0 Highest block SCN : 2862198939 (3482.2862198939) oracle@kdjf1:/home/oracle>rman target / Recovery Manager: Release 11.2.0.3.0 - Production on Wed Sep 7 14:32:38 2016 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. connected to target database: BILL (DBID=1844724644) RMAN> backup validate datafile 819; input datafile file number=00819 name=/dev/rjf_lv16_412 channel ORA_DISK_1: backup set complete, elapsed time: 00:00:25 List of Datafiles ================= File Status Marked Corrupt Empty Blocks Blocks Examined High SCN ---- ------ -------------- ------------ --------------- ---------- 819 OK 0 469424 552959 14957938323611 File Name: /dev/rjf_lv16_412 Block Type Blocks Failing Blocks Processed ---------- -------------- ---------------- Data 0 83301 Index 0 0 Other 0 234
TIP:
直到数据格式化后DBV还是正确的扫描, RMAN扫描的数据和实际一致, 为什么会出现这种情况呢? 其实是和创建表空间方式及DBV 读取信息的位置相关。下面是我创建表空间的方法.
SQL> create tablespace test datafile '/dev/rjf_lv16_412' size 1g; Tablespace created. SQL> drop tablespace test; Tablespace dropped. SQL> create tablespace test datafile '/dev/rjf_lv16_412'; Tablespace created. Note: 注意上面的顺序,开始我创建表空间时给一个16g的裸设备指定了1G的大小,然后删掉表空间,第二次创建表空间时还使用了原来的LV 裸设备并未指定大小(这种方式数据文件的大小实为lv的实际大小),所以如果最后一次使用数据文件未指定大小时,dbv读取的还是上一次创建表空间时指定的数据文件大小,我已做测试验证。如果lv 之前从未指定过大小呢? 下面是我的测试.
SQL> create tablespace test datafile '/dev/rjf_lv16_590'; Tablespace created. oracle@kdjf1:/home/oracle>dbfsize /dev/rjf_lv16_590 Database file: /dev/rjf_lv16_590 Database file type: raw device without 4K starting offset Database file size: 0 32768 byte blocks SQL> select name ,bytes,blocks,block_size from v$datafile where file#=819; NAME BYTES BLOCKS BLOCK_SIZE ------------------------------ --------------- --------------- --------------- /dev/rjf_lv16_590 18119360512 552959 32768 oracle@kdjf1:/home/oracle>dbv file=/dev/rjf_lv16_590 blocksize=32768 DBVERIFY: Release 11.2.0.3.0 - Production on Wed Sep 7 16:07:25 2016 Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved. DBVERIFY - Verification starting : FILE = /dev/rjf_lv16_590 DBVERIFY - Verification complete Total Pages Examined : 0 Total Pages Processed (Data) : 0 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 0 Total Pages Failing (Index): 0 Total Pages Processed (Other): 0 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 0 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Total Pages Encrypted : 0 Highest block SCN : 0 (0.0)
Note:
如果之前LV从未使用过,且增加数据文件时使用的LV未指定大小, 这个DBV扫描全是0, 无论LV上有没有数据,使用RMAN也可以正常扫描,这里不在演示, 可以猜测创建表空间时如果指定了大小会在lv的文件头上直接写入, 下面可以尝试读取文件头,这里我还是使用了开始使用的曾经指定过1GB size和从未指定的两个LV。
oracle@kdjf1:/home/oracle>dd if=/dev/rjf_lv16_412 of=/tmp/1g.dd bs=32768 count=10 10+0 records in. 10+0 records out. oracle@kdjf1:/home/oracle>dd if=/dev/rjf_lv16_590 of=/tmp/zero.dd bs=32768 count=10 10+0 records in. 10+0 records out. oracle@kdjf1:/home/oracle>od -x /tmp/1g.dd 0000000 00e2 0000 ffc0 0000 0000 0000 0000 0105 0000020 d991 0000 0000 8000 0000 8000 7a7b 7c7d 0000040 0000 21b0 0000 0000 0000 0000 0000 0000 0000060 0000 0000 0000 0000 0000 0000 0000 0000 * oracle@kdjf1:/home/oracle>od -x /tmp/zero.dd 0000000 00e2 0000 ffc0 0000 0000 0000 0000 0000 0000020 5894 0000 0000 8000 0000 0000 7a7b 7c7d 0000040 0000 21b0 0000 0000 0000 0000 0000 0000 0000060 0000 0000 0000 0000 0000 0000 0000 0000 *
TIP:
猜测第一个8000的位置应该是block_size, 第二个红色加粗的位置应该是number of blocks, 这样就可以猜测,DBV应该是根据这个位置确认扫描的裸设备的长度, 该现象通过resize 指定大小重写入这个blocks应该可以解决。 这让我想起了关于block 0(zero)扫描为什么dbv可以发现,Lunar的PPT提到的。
Corruption in Block Zero will not cause damage to the data and in versions lower than 11g it is not detected by dbverify/rman. In 11g Dbverify has been enhanced to detect it.
Summary:
RAW DEVICE可以在增加数据文件时不指定文件大小,可用空间这样通常是RAW Device的实际大小, 但是文件头上不会写入可用块数,块大小会写入, 这种情况下DBV工具无法从文件头正确的获取blocks数,所以产生错误的扫描块数结果。在不指定大小的情况下,如果RAW Device曾经文件头上有记录之前的blocks,在新加入数据库时也不会擦写该位置,这样后期在使用DBV时的结果就不正确, RMAN 验证不存在这个问题。
对不起,这篇文章暂时关闭评论。