Index partition unusable and wait ‘kpodplck wait before retrying ORA-54’ event during direct sqlldr
最近有人遇到用SQL*Loader 向数据库加载数据时发现很慢,并且反映有几个分区表上的local索引总是递归的变成invalid or unusable,rebuild该分区失效索引,另一个索引分区又失效。
对于失效索引开始启用了个DDL trigger,并没有发现这段时间有前台进程DDL,还有一种可能在sqlldr 加载数据时使用了直接路径时(DIRECT=TRUE) 也会导致索引失效,要来了sqlldr 的脚本如下:
sqlldr WEEJAR/WEEJAR_321@anbob_n2 control=../tmp/15861558temp.CTL log=../tmp/15861558temp.LOG data=../tmp/15861558temp.DATA parallel=flase direct=true rows=1000000
Note:
这里sqlldr确实使用了direct path load, 当sqlldr使用了直接路径加载时,对于表上的索引是使用了一种独立的排序旧的index和新增的索引值合并成新的index segment,而且对于索引维护默认是推迟到加载完所有的数据后自动维护的, 对于大表或大的索引这过程将会花费较长的时间,加载时查询index partition是invalid状态,自动维护索引rebuild前提是有足够可用的temp 表空间,否则sqlldr 的数据仍会导入,但是index会被剩下,停留在invalid or unusable状态。因为sqlldr 中没有skip_index_maintenance参数,告知了自动维护的原理并没必要跟着rebuild index, 最后等load完所有数据查询确认了无无效索引。
下面分析load慢的现象
SQL> select username,machine,program,event ,status from v$session where event like 'kpodplck wait before%'; USERNAME MACHINE PROGRAM EVENT STATUS ---------- ---------- -------------- ------------------------------------ -------- WEEJAR anbob2 sqlldr@anbob2 ( kpodplck wait before retrying ORA-54 ACTIVE WEEJAR anbob2 sqlldr@anbob2 ( kpodplck wait before retrying ORA-54 ACTIVE WEEJAR anbob2 sqlldr@anbob2 ( kpodplck wait before retrying ORA-54 ACTIVE WEEJAR anbob2 sqlldr@anbob2 ( kpodplck wait before retrying ORA-54 ACTIVE WEEJAR anbob2 sqlldr@anbob2 ( kpodplck wait before retrying ORA-54 ACTIVE WEEJAR anbob2 sqlldr@anbob2 ( kpodplck wait before retrying ORA-54 ACTIVE WEEJAR anbob2 sqlldr@anbob2 ( kpodplck wait before retrying ORA-54 ACTIVE ... retrying ORA-54 WEEJAR anbob2 sqlldr@anbob2 ( kpodplck wait before retrying ORA-54 ACTIVE WEEJAR anbob2 sqlldr@anbob2 ( kpodplck wait before retrying ORA-54 ACTIVE WEEJAR anbob2 sqlldr@anbob2 ( kpodplck wait before retrying ORA-54 ACTIVE WEEJAR anbob2 sqlldr@anbob2 ( kpodplck wait before retrying ORA-54 ACTIVE WEEJAR anbob2 sqlldr@anbob2 ( kpodplck wait before retrying ORA-54 ACTIVE WEEJAR anbob2 sqlldr@anbob2 ( kpodplck wait before retrying ORA-54 ACTIVE 137 rows selected. SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bi PL/SQL Release 10.2.0.4.0 - Production CORE 10.2.0.4.0 Production TNS for HPUX: Version 10.2.0.4.0 - Production NLSRTL Version 10.2.0.4.0 - Production kdtzb1:/oracle> uname -a HP-UX kdtzb1 B.11.31 U ia64 2547054003 unlimited-user license
Note:
这个event比较少见从名字上kpodplck Oracle Direct Path lock, ora-54 资源占用的到比较常见。 搜索了一下这个事件在MOS中找到了一个bug,但是在11.2.0.1中已修复,现在是在10.2.0.4和另一套11.2.0.3的库同时运行相同的load,现象都一样也是“kpodplck wait before retrying ORA-54”
Bug 10079079 : KPODPLCK WAIT BEFORE RETRYING ORA-54 REPORTED IN SQL LOADER DIRECT PATH LOAD
当时做了systemstate dump,给O记提了个sr
PROCESS 86: ---------------------------------------- SO: c0000020f653e9f0, type: 2, owner: 0000000000000000, flag: INIT/-/-/0x00 (process) Oracle pid=86, calls cur/top: c000002049698f20/c000002049698f20, flag: (0) - int error: 0, call error: 0, sess error: 0, txn error 0 (post info) last post received: 0 0 0 last post received-location: No post last process to post me: none last post sent: 0 0 0 last post sent-location: No post last process posted by me: none (latch info) wait_event=0 bits=0 Process Group: DEFAULT, pseudo proc: c0000020f861a060 O/S info: user: oracle, term: UNKNOWN, ospid: 17894 OSD pid info: Unix process pid: 17894, image: oracle@kdtzb1 Short stack dump: ksdxfstk()+48<-ksdxcb()+5776<-sspuser()+640<-<-_pw_wait()+48<-pw_wait()+128<-sskgpwwait()+384<-skgpwwait()+208<-ksliwat()+1728<-kslwaitns_timed()+112<-kskthbwt()+400<-kslwait()+640<-$cold_kpod plck()+2976<-kpodpp()+2384<-opiodr()+2128<-ttcpip()+3088<-opitsk()+2336<-opiino()+1840<-opiodr()+2128<-opidrv()+1088<-sou2o()+336<-opimai_real()+224<-main()+368<-main_opd_entry()+80 Dump of memory from 0xC0000020F352D590 to 0xC0000020F352D798 C0000020F352D590 00000005 00000000 C0000020 4B6FFCE8 [........... Ko..] C0000020F352D5A0 00000010 000313A7 C0000020 49698F20 [........... Ii. ] C0000020F352D5B0 00000003 000313A7 C0000020 F5BC5FF0 [........... .._.] C0000020F352D5C0 00000013 0003129B C0000020 F58CAB00 [........... ....] C0000020F352D5D0 0000000B 000313A7 C0000020 F579C008 [........... .y..] C0000020F352D5E0 00000004 0003129B 00000000 00000000 [................] C0000020F352D5F0 00000000 00000000 00000000 00000000 [................] Repeat 25 times C0000020F352D790 00000000 00000000 [........] ---------------------------------------- SO: c0000020f579c008, type: 4, owner: c0000020f653e9f0, flag: INIT/-/-/0x00 (session) sid: 2093 trans: 0000000000000000, creator: c0000020f653e9f0, flag: (41) USR/- BSY/-/-/-/-/- DID: 0001-0056-0008B621, short-term DID: 0001-0056-0008B622 txn branch: 0000000000000000 oct: 0, prv: 0, sql: 0000000000000000, psql: c000001feee0b850, user: 208/WEEJAR service name: SYS$USERS O/S info: user: WEEJAR, term: , ospid: 23397538, machine: anbob2 program: sqlldr@anbob2 (TNS V1-V3) application name: SQL Loader Direct Path Load, hash value=1090021382 action name: ../tmp/21823588temp.CTL, hash value=2698844265 waiting for 'kpodplck wait before retrying ORA-54' blocking sess=0x0000000000000000 seq=31 wait_time=0 seconds since wait started=16 =0, =0, =0 Dumping Session Wait History for 'kpodplck wait before retrying ORA-54' count=1 wait_time=1292889 =0, =0, =0 for 'kpodplck wait before retrying ORA-54' count=1 wait_time=986043 =0, =0, =0 for 'kpodplck wait before retrying ORA-54' count=1 wait_time=985409 =0, =0, =0 for 'kpodplck wait before retrying ORA-54' count=1 wait_time=984200 =0, =0, =0 for 'kpodplck wait before retrying ORA-54' count=1 wait_time=987391 =0, =0, =0 for 'kpodplck wait before retrying ORA-54' count=1 wait_time=982784 =0, =0, =0 for 'kpodplck wait before retrying ORA-54' count=1 wait_time=988524 =0, =0, =0 for 'kpodplck wait before retrying ORA-54' count=1 wait_time=985431 =0, =0, =0 for 'kpodplck wait before retrying ORA-54' count=1 wait_time=985767 =0, =0, =0 for 'kpodplck wait before retrying ORA-54' count=1 wait_time=983840 =0, =0, =0
SR 的恢复,该现象不是bug, 但是当使用direct=true增加parallel=true 对资源锁转换的方式来避免或减少该事件。
对不起,这篇文章暂时关闭评论。