揭秘 Oracle 11gR2 RAC ASM Instance 启动
在ORACLE 11G RAC或使用ASM环境, 存储ASM instance spfile到ASM instance,听上去很奇怪,启动实例所需要的文件在自已的实例里,对于ASM和ASM spfile先有鸡还是先有蛋的问题,首先要解决的几个问题:
1,asm diskgroup 没mount,如何读spfile?
2,在oracle_home/dbs下没有参数文件,如何知道spfile在哪?
3,不知道spfile,怎么知道的asm_diskstring?
我在一个11203 2nodes linux rac做个实验
[grid@znode1 ~]$ crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online [grid@znode1 ~]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora.CRSDG.dg ora....up.type ONLINE ONLINE znode1 ora.DBDG.dg ora....up.type ONLINE ONLINE znode1 ora.FLRV.dg ora....up.type ONLINE ONLINE znode1 ora....ER.lsnr ora....er.type ONLINE ONLINE znode1 ora....N1.lsnr ora....er.type ONLINE ONLINE znode1 ora....N2.lsnr ora....er.type ONLINE ONLINE znode2 ora....N3.lsnr ora....er.type ONLINE ONLINE znode2 ora.asm ora.asm.type ONLINE ONLINE znode1 ora.cvu ora.cvu.type ONLINE ONLINE znode2 ora.gsd ora.gsd.type OFFLINE OFFLINE ora....network ora....rk.type ONLINE ONLINE znode1 ora.oc4j ora.oc4j.type ONLINE ONLINE znode2 ora.ons ora.ons.type ONLINE ONLINE znode1 ora.rac.db ora....se.type ONLINE ONLINE znode1 ora.scan1.vip ora....ip.type ONLINE ONLINE znode1 ora.scan2.vip ora....ip.type ONLINE ONLINE znode2 ora.scan3.vip ora....ip.type ONLINE ONLINE znode2 ora....SM1.asm application ONLINE ONLINE znode1 ora....E1.lsnr application ONLINE ONLINE znode1 ora.znode1.gsd application OFFLINE OFFLINE ora.znode1.ons application ONLINE ONLINE znode1 ora.znode1.vip ora....t1.type ONLINE ONLINE znode1 ora....SM2.asm application ONLINE ONLINE znode2 ora....E2.lsnr application ONLINE ONLINE znode2 ora.znode2.gsd application OFFLINE OFFLINE ora.znode2.ons application ONLINE ONLINE znode2 ora.znode2.vip ora....t1.type ONLINE ONLINE znode2 [grid@znode1 ~]$ srvctl stop database -d rac [grid@znode1 ~]$ crsctl stop resource ora.FLRV.dg [grid@znode1 ~]$ crsctl stop resource ora.DBDG.dg [grid@znode1 ~]$ crsctl stop resource ora.CRSDG.dg [grid@znode1 ~]$ crsctl stop resource ora.asm [grid@znode1 ~]$ sqlplus / as sysasm SQL*Plus: Release 11.2.0.3.0 Production on Fri Jul 20 08:58:45 2012 Copyright (c) 1982, 2011, Oracle. All rights reserved. Connected to an idle instance.
用strace命令跟踪一下进程,也可以用truss,dtrace等平台工具。
[session 2] [root@znode1 ~]# ps -ef|grep oracle grid 20642 20639 0 08:58 ? 00:00:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) root 20756 20709 0 09:00 pts/2 00:00:00 grep oracle [root@znode1 ~]# ps -ef|grep 20639 grid 20639 20638 0 08:58 pts/3 00:00:00 sqlplus as sysasm grid 20946 20639 0 09:03 ? 00:00:00 oracle+ASM1 (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq))) root 21460 21401 0 09:09 pts/4 00:00:00 grep 20639 [root@znode1 ~]# strace -o asmstartup.tra -p 20642 Process 20642 attached - interrupt to quit [session1] SQL> startup nomount ASM instance started Total System Global Area 284565504 bytes Fixed Size 1344456 bytes Variable Size 258055224 bytes ASM Cache 25165824 bytes SQL> [session2 显示] [root@znode1 ~]# strace -o asmstartup.tra -p 20642(不用再重新输入) Process 20642 attached - interrupt to quit Process 20642 detached [root@znode1 ~]# [root@znode1 ~]# vi asmstartup.tra read(9, "\0001\0\0\6\0\0\0\0\0\3\212\6\376\377\377\377\0\1\0\0\376\377\377\377\376\377\377\377\0\0\0"..., 8208) = 49 mmap2(NULL, 196608, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6ab6000 times({tms_utime=0, tms_stime=3, tms_cutime=0, tms_cstime=0}) = 522261802 ...省略 uname({sys="Linux", node="znode1", ...}) = 0 gettimeofday({1342746189, 667048}, NULL) = 0 getuid32() = 1100 getppid() = 20639 ...省略 connect(6, {sa_family=AF_FILE, path="/var/tmp/.oracle/sOCSSD_LL_znode1_"...}, 110) = 0 recv(6, "0\0\0\0\1\0\1\1\1\0\1\0\1\0\0\0\0\0\0\0\1\0\0\0\254\305!\0\rq\0\0"..., 10240, 0) = 48 recv(6, 0x119c6768, 10240, 0) = -1 EAGAIN (Resource temporarily unavailable) recv(6, 0x119c6768, 10240, 0) = -1 EAGAIN (Resource temporarily unavailable) recv(6, 0x119c6768, 10240, 0) = -1 EAGAIN (Resource temporarily unavailable) recv(6, 0x119c6768, 10240, 0) = -1 EAGAIN (Resource temporarily unavailable) open("/u01/app/11.2.0/grid/auth/css/znode1/A6126639/9dc2315", O_WRONLY|O_CREAT|O_EXCL, 0644) = 13 fchmod(13, 0644) = 0 write(13, "k\205\235\177", 4) = 4 close(13)
TIP:从这段文本可以看到进程去连接OCSSD 验证自身身份,所以这也是为什么在启asm之前要先启css服务,如果连接失败自然ASM启动就会出错,如果成功就可以通过socket file /var/tmp/.oracle/sCRSD_UI_SOCK与OHASD通信,上篇也说过是OHASD负责OLR管理,所以就可以从OLR中得到SPFILE 路径信息和asm_diskstring,从接下来的输出可以看到
open("/etc/oracle/olr.loc", O_RDONLY) = 13 fstat64(13, {st_mode=S_IFREG|0644, st_size=82, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6c77000 read(13, "olrconfig_loc=/u01/app/11.2.0/gr"..., 4096) = 82 read(13, "", 4096) = 0 close(13) = 0 munmap(0xb6c77000, 4096) = 0 open("/etc/oracle/ocr.loc", O_RDONLY) = 13 fstat64(13, {st_mode=S_IFREG|0644, st_size=38, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb6c77000 read(13, "ocrconfig_loc=+crsdg\nlocal_only="..., 4096) = 38 read(13, "", 4096) = 0 close(13) = 0 ... stat64("/etc/localtime", {st_mode=S_IFREG|0644, st_size=405, ...}) = 0 stat64("/u01/app/11.2.0/grid/gpnp/znode1/wallets/peer/cwallet.sso", {st_mode=S_IFREG|0700, st_size=6301, ...}) = 0 lstat64("/u01/app/11.2.0/grid/gpnp/znode1/wallets/peer/cwallet.sso", {st_mode=S_IFREG|0700, st_size=6301, ...}) = 0 open("/u01/app/11.2.0/grid/gpnp/znode1/wallets/peer/cwallet.sso", O_RDONLY|O_LARGEFILE) = 18 write(3, "kfspFileNameGet name=+CRSDG/rac/"..., 76) = 76 write(3, "\n", 1) = 1 times({tms_utime=2, tms_stime=19, tms_cutime=0, tms_cstime=0}) = 522261855 write(3, "kgfspb_shallow_discover dscstr=\""..., 33) = 33 write(3, "\n", 1) = 1 open("/dev/raw", O_RDONLY|O_NONBLOCK|O_LARGEFILE|O_DIRECTORY) = 21 fcntl64(21, F_SETFD, FD_CLOEXEC) = 0 getdents64(21, /* 8 entries */, 32768) = 192 getdents64(21, /* 0 entries */, 32768) = 0 close(21) = 0 access("/dev/raw/raw4", R_OK|W_OK) = 0 stat64("/dev/raw/raw4", {st_mode=S_IFCHR|0660, st_rdev=makedev(162, 4), ...}) = 0 open("/dev/raw/raw4", O_RDONLY|O_LARGEFILE) = 21
Tip:
正如人们所期望的那样,OHASD 从 OLR 读取此信息,它的路径,它来自 /etc/oracle/olr.loc。我想指出,对于 Oracle Restart 模式,在 Oracle RAC 环境信息中,它是正确的 存储在 GPNP 配置文件中,并且有维护和管理此配置文件信息的 GPNPD 进程
write(3, "kgfdp_query grp CRSDG", 21) = 21 write(3, "\n", 1) = 1 times({tms_utime=2, tms_stime=20, tms_cutime=0, tms_cstime=0}) = 522261860 write(3, "kgfdp_create", 12) = 12 write(3, "\n", 1) = 1 times({tms_utime=2, tms_stime=20, tms_cutime=0, tms_cstime=0}) = 522261861 write(3, "kgfdp_initCall ausize=1048576 bl"..., 61) = 61 write(3, "\n", 1) = 1 times({tms_utime=2, tms_stime=20, tms_cutime=0, tms_cstime=0}) = 522261861 write(3, "kgfdp_doQuery sync=2", 20) = 20 write(3, "\n", 1) = 1 times({tms_utime=2, tms_stime=20, tms_cutime=0, tms_cstime=0}) = 522261861 write(3, "kgfdp_read", 10) = 10 write(3, "\n", 1) = 1 times({tms_utime=2, tms_stime=20, tms_cutime=0, tms_cstime=0}) = 522261861 write(3, "kgfdp_readMeta totPst=6 blksz=40"..., 34) = 34 write(3, "\n", 1) = 1
TIP:
“kgfspb_shallow_discover dscstr=\”” 对应的是asm_diskstring,扫描ASM DISK的路径,如果是用ASMLIB或dev/raw/或/dev/oracleasm/disks/在linux环境中值为null,然后扫描了每个磁盘的文件头读取元数据,从文件头可以得到磁盘组信息、spfile、voting disk file 位置,分别是,kfdhdb.dskname,kfdhdb.spfile, kfdhdb.spfflg (first block and number of blocks)and kfdhdb.vfstart, kfdhdb.vfend (begin block and end block). 可以用$ORACLE_HOME/bin下的kfed工具读硬盘文件。
[grid@znode1 ~]$ kfed read /dev/raw/raw1|more kfbh.type: 1 ; 0x002: KFBTYP_DISKHEAD --Indicates that this is ASM disk header kfdhdb.dsknum: 0 ; 0x024: 0x0000 -- disk number kfdhdb.grptyp: 2 ; 0x026: KFDGTP_NORMAL -- Indicates mirroring level, in my case it is NORMAL kfdhdb.hdrsts: 3 ; 0x027: KFDHDR_MEMBER -- number of member kfdhdb.dskname: CRSDG_0000 ; 0x028: length=10 -- disk name kfdhdb.grpname: CRSDG ; 0x048: length=5 -- disk group name kfdhdb.fgname: CRSDG_0000 ; 0x068: length=10 -- To which failure group this disk belongs kfdhdb.secsize: 512 ; 0x0b8: 0x0200 -- Disk sector size kfdhdb.blksize: 4096 ; 0x0ba: 0x1000 -- Disk block size kfdhdb.ausize: 1048576 ; 0x0bc: 0x00100000 -- Disk au size kfdhdb.mfact: 113792 ; 0x0c0: 0x0001bc80 kfdhdb.dsksize: 964 ; 0x0c4: 0x000003c4 -- disk total size MB kfdhdb.vfstart: 256 ; 0x0ec: 0x00000100 -- Begin block address of voting disk file kfdhdb.vfend: 288 ; 0x0f0: 0x00000120 -- End block address of voting disk file kfdhdb.spfile: 59 ; 0x0f4: 0x0000003b -- Begin block address of spfile kfdhdb.spfflg: 1 ; 0x0f8: 0x00000001 -- Number of blocks containing spfile
从X$KFFXP也可以查询到这个位置
X$KFFXP contains the physical allocation table for each ASM file, that is it contains the mapping between ASM files (identified by columns NUMBER_KFFXP and COMPOUND_KFFXP) and their location on disk is Column Name Description ADDR table address/identifier INDX row identifier INST_ID instance number (RAC) NUMBER_KFFXP ASM file number. Join with v$asm_file and v$asm_alias COMPOUND_KFFXP File identifier. Join with compound_index in v$asm_file INCARN_KFFXP File incarnation id. Join with incarnation in v$asm_file PXN_KFFXP Extent number per file XNUM_KFFXP Logical extent number per file (mirrored extents have the same value) GROUP_KFFXP ASM disk group number. Join with v$asm_disk and v$asm_diskgroup DISK_KFFXP Disk number where the extent is allocated. Join with v$asm_disk AU_KFFXP Relative position of the allocation unit from the beginning of the disk. The allocation unit size (1 MB) in v$asm_diskgroup LXN_KFFXP 0,1 used to identify primary/mirror extent, 2 identifies file header allocation unit (hypothesis) FLAGS_KFFXP N.K. CHK_KFFXP N.K. SQL> select NAME "FILE NAME", 2 AU_KFFXP "AU NUMBER", 3 NUMBER_KFFXP "FILE NUMBER", 4 GROUP_Number, 5 DISK_KFFXP "DISK NUMBER" 6 from x$kffxp, v$asm_alias 7 where GROUP_KFFXP = GROUP_NUMBER 8 and NUMBER_KFFXP = FILE_NUMBER 9 and lower(name) ='registry.253.787925627' 10 ; FILE NAME AU NUMBER FILE NUMBER GROUP_NUMBER DISK NUMBER ---------------------------------------- ---------- ----------- ------------ ----------- REGISTRY.253.787925627 59 253 1 0 REGISTRY.253.787925627 59 253 1 2 SQL> select group_number,path,disk_number from v$asm_disk order by 1,3; GROUP_NUMBER PATH DISK_NUMBER ------------ -------------------- ----------- 1 /dev/raw/raw1 0 1 /dev/raw/raw2 1 1 /dev/raw/raw3 2 2 /dev/raw/raw4 0 2 /dev/raw/raw5 1 3 /dev/raw/raw6 0 6 rows selected. SQL> select group_number,name from v$asm_diskgroup; GROUP_NUMBER NAME ------------ ------------------------------ 1 CRSDG 2 DBDG 3 FLRV
Tip:
可以看出spfile可以在/dev/raw/raw1、/dev/raw/raw3 的59号AU上
[grid@znode1 ~]$ cat /etc/oracle/olr.loc olrconfig_loc=/u01/app/11.2.0/grid/cdata/znode1.olr crs_home=/u01/app/11.2.0/grid strings /u01/app/11.2.0/grid/cdata/znode1.olr|grep -I asmparameterfile cat /u01/app/11.2.0/grid/gpnp/profiles/peer/profile.xml|grep -I asmparameterfile gpnptool get
都可以得到下面的信息
/u01/app/11.2.0/grid/cdata/znode1.olr、OCR 和/u01/app/11.2.0/grid/gpnp/profiles/peer/profile.xml和spfile中是都同步的么?
做个实验alter system set asm_diskstring=’/dev/raw/raw*’,发现只有u01/app/11.2.0/grid/gpnp/profiles/peer/profile.xml的输出没有修改。
所以得出的结论是
OHASD–>OLR—–>GPNP profile(DiscoveryString)—>asm_disk物理路径—>扫描所有disk header,得到spfile物理位置—>直接路径读spfile—>启动ASM—>如有更新同步OLR
reference:
http://aychin.wordpress.com
http://www.askmaclean.com/
对不起,这篇文章暂时关闭评论。