Tuning ‘latch free’ & ‘null event’ wait event in Oracle 9i
最近遇到一套9I数据库遇到了性能问题, 现象是数据库主机CPU使用率很高应用响应缓慢,Cpu Idel几乎为0, 从v$session_wait查看数据库当前的活动会话在等待’null event’和’latch free’.
anbob% uname -a SunOS anbob 5.10 Generic_139555-08 sun4u sparc SUNW,SPARC-Enterprise anbob% sqlplus "/ as sysdba" SQL*Plus: Release 9.2.0.1.0 - Production on Fri Nov 11 11:35:34 2016 Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved. Connected to: Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production With the Partitioning, OLAP and Oracle Data Mining options JServer Release 9.2.0.1.0 - Production SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle9i Enterprise Edition Release 9.2.0.1.0 - 64bit Production PL/SQL Release 9.2.0.1.0 - Production CORE 9.2.0.1.0 Production TNS for Solaris: Version 9.2.0.1.0 - Production NLSRTL Version 9.2.0.1.0 - Production SQL> select event,count(*) from v$session_wait group by event; EVENT COUNT(*) ---------------------------------------------------------------- ---------- SQL*Net message from client 66 db file scattered read 2 latch free 21 null event 83 pmon timer 1 rdbms ipc message 5 smon timer 1 wakeup time manager 1
Note:
对于null event通常是数据库暂时未最到等待事件的命名,原因可能是当时的CPU负载高,数据库HANG,bug, 本案例应该是因为latch导致的CPU耗尽,所以解决latch free先。
latch是一种访问共享资源时的内部内存结构,通常latch的持有非常的短暂,又因为latch spin非常的消耗CPU,sleeping阶段不会消耗CPU time但同时会显示latch free的wait event.
通常对于latch free需要确认相关的具体的latch和相关的SQL及涉及的对象。如果感兴趣推荐阅读去搜索。 对于latch free的三个参数如下:
P1 = Latch address
P2 = Latch number
P3 = Tries
SQL> select name from v$latchname where latch#=97; NAME ---------------------------------------------------------------- cache buffers chains SQL> select sql_hash_value,count(*) from v$session_wait a,v$session b where a.sid=b.sid and event='latch free' group by sql_hash_value; SQL_HASH_VALUE COUNT(*) --------------- --------------- ... 1993732470 1 2081869776 1 2349207484 2 2380084478 2 2695258969 1 2907909916 1 3165338618 1 3442647599 1 3655775635 1 3756755013 3 3790812144 1 3936368692 1 3980632519 2 4005557655 1 4161039002 1 4249511952 1 4274466022 2 32 rows selected. SQL> select p1,count(*) from v$session_wait a where event='latch free' group by p1; P1 COUNT(*) --------------- --------------- 16162378384 1 16162683952 9 16162801168 13 16162947472 23 select to_char(16162947472,'xxxxxxxxxxxxxxx') from dual; TO_CHAR(16162947 ---------------- 3c3630190 SQL> select * from v$latch_children where addr=upper(lpad('3c3630190',16,0)); ADDR LATCH# CHILD# LEVEL# NAME GETS MISSES SLEEPS IMMEDIATE_GETS IMMEDIATE_MISSES ---------------- -------- ------- ---------- ---------------------- --------- --------- -------- --------------- ---------------- 00000003C3630190 97 2949 1 cache buffers chains 294507 499 373 8804 3 -- wait 1s SQL> / ADDR LATCH# CHILD# LEVEL# NAME GETS MISSES SLEEPS IMMEDIATE_GETS IMMEDIATE_MISSES ---------------- -------- ------- ---------- ---------------------- --------- --------- -------- --------------- ---------------- 00000003C3630190 97 2949 1 cache buffers chains 300406 503 373 8925 3 SQL> / ADDR LATCH# CHILD# LEVEL# NAME GETS MISSES SLEEPS IMMEDIATE_GETS IMMEDIATE_MISSES ---------------- -------- ------- ---------- ---------------------- --------- --------- -------- --------------- ---------------- 00000003C3630190 97 2949 1 cache buffers chains 300966 504 373 8939 3 SQL> / ADDR LATCH# CHILD# LEVEL# NAME GETS MISSES SLEEPS IMMEDIATE_GETS IMMEDIATE_MISSES ---------------- -------- ------- ---------- ---------------------- --------- --------- -------- --------------- ---------------- 00000003C3630190 97 2949 1 cache buffers chains 301324 504 373 8948 3 SQL> select hladdr, file#, dbablk, decode(state,1,'cur ',3,'CR',state) ST, tch from x$bh where hladdr in (select addr from (select addr from v$latch_children where addr=upper(lpad('3c3630190',16,0)) order by sleeps, misses,immediate_misses desc )where rownum <2) order by tch; HLADDR FILE# DBABLK ST TCH ---------------- ---------- ---------- ---------------------------------------- ---------- ... ... 00000003C3630190 16 436368 cur 156 00000003C3630190 17 491740 cur 157 00000003C3630190 11 257281 cur 160 00000003C3630190 16 706704 cur 161 00000003C3630190 12 468301 cur 162 00000003C3630190 12 759117 cur 162 00000003C3630190 12 1017696 cur 169 00000003C3630190 11 204033 cur 170 00000003C3630190 17 475356 cur 170 00000003C3630190 12 226637 cur 174 00000003C3630190 18 248104 cur 178 00000003C3630190 12 128333 cur 179 00000003C3630190 16 1169552 cur 185 00000003C3630190 16 727184 cur 190 00000003C3630190 12 173389 cur 194 00000003C3630190 12 857952 cur 197 00000003C3630190 12 578893 cur 214 00000003C3630190 12 1029984 cur 226 00000003C3630190 18 5909 cur 228 00000003C3630190 16 1255568 cur 410 00000003C3630190 18 157992 cur 422 00000003C3630190 19 119156 cur 468 00000003C3630190 17 241884 cur 490 00000003C3630190 12 1156960 cur 508 00000003C3630190 16 907408 cur 600 00000003C3630190 12 1009504 cur 1051 <<< 125 rows selected. SQL> select owner, segment_name,partition_name, segment_type, file_id,block_id from dba_extents where file_id = 12 and 1009504 between block_id and block_id+blocks-1; OWNER SEGMENT_NAME PARTITION_NAME SEGMENT_TYPE FILE_ID BLOCK_ID ------------- -------------------------- ---------------- ------------------ ---------- ---------- LTEOMS IDX_WFWI_STATE INDEX 12 1008521 SQL> select /*+ RULE */ substr(a.event,1,25) ,substr(b.program,1,39) program, b.username,substr(b.osuser||'@'||b.machine||'@'||b.process||'@'||c.spid,1,31) client, to_char(b.logon_time,'mm-dd hh24:mi') logon_time from v$session_wait a,v$session b, v$process c where a.sid=b.sid and b.paddr=c.addr and a.event ='latch free'; PROGRAM USERNAME CLIENT LOGON_TIME --------------------------- --------------- --------------------------- ----------- JDBC Thin Client LTEOMS root@hneoms@@2172 11-11 10:16 JDBC Thin Client LTEOMS root@hneoms@@2394 11-11 10:27 JDBC Thin Client LTEOMS root@hneoms@@2222 11-11 10:18 JDBC Thin Client LTEOMS root@hneoms@@1998 11-11 10:09 JDBC Thin Client LTEOMS root@hneoms@@1974 11-11 10:07 JDBC Thin Client LTEOMS root@hneoms@@1915 11-11 10:05 6 rows selected. select hash_value,sql_text from v$sql where hash_value in( 1045805320, 1081719454, 1542141426, 1663684681, 1759938066, 1826030977, 1906825844, 1993732470, 2081869776, 2349207484, 2380084478, 2695258969, 2907909916, 3165338618, 3442647599, 3655775635, 3756755013, 3790812144, 3936368692, 3980632519, 4005557655, 4161039002, 4249511952, 4274466022 ); SELECT COUNT(*) AMOUNT FROM (SELECT DISTINCT ADVISE_NUM adviseNum,REPLY_NUM replyNum,HASTEN_NUM hastenNum,MAIN_ID mainId,TITLE title,SHEET_ID sheetId,STATUS status,SEND_TIME sendTime,SEND_DEPT_ID sendDeptId,SEND_USER_ID sendUserId,LINK_ID linkId,TASK_ACCEPT_LIMIT taskAcceptLimit,TASK_COMPLETE_LIMIT taskCompleteLimit,TASK_TYPE taskType,START_TIME startTime,END_TIME endTime,ACTIVITY_DEF_ID activityDefId,ACTIVITY_INST_NAME activityInstName,ACTIVITY_INST_ID activityInstId,PROCESS_INST_ID processInstId,PROCESS_INST_NAME processInstName,PROCESS_DEF_ID processDefId,PROCESS_DEF_NAME processDefName,WO ... ... 24 rows selected.
Note:
根据p2的值确认了是cache buffers chains latch,通常出现在hot block的SQL中,并且只有gets有明显的增长,通常从x$bh的tch列可以确认对象为一索引对象,并且有确认了相的SQL基本相同只是字面量值不同,这个sql较长这里之截取了其中一部分,因为v$sql在9i中不存在sqlfull_text, 可以写PLSQL 去v$sqltext中拼全SQL语句, 继续可以从SQL的执行计划与hot block的对象交集进行判断优化。因为9i中没有dbms_xplan package查看执行计划,这里我调用了TanelPoder的脚本。因为该SQL的执行计划有约1000行,我们可以先看与hot block 索引相关的执行计划。
SQL> @xm 3756755013 0 Ch Pr Op Objcect Optimizer Optim rows Optim bytes Optimizer ld ed ID Operation Name Cost from step from step Mode -- -- ---- ------------------------------------------------------- ------------------------------ --------- ---------- ----------- ---------- 0 0 SELECT STATEMENT 13222 CHOOSE 1 SORT AGGREGATE 1 2 VIEW 13222 64 3 SORT UNIQUE 13222 64 44736 4 VIEW 13268 64 44736 5 SORT UNIQUE 13268 64 46142 6 UNION-ALL 7 CONCATENATION 8 NESTED LOOPS 44 1 723 AF 9 HASH JOIN 43 1 697 10 NESTED LOOPS 40 1 321 11 INLIST ITERATOR F 12 TABLE ACCESS BY INDEX ROWID WFWIPARTICIPANT 6 314 26376 A 13 INDEX RANGE SCAN WF_IDX_PART_TYPE 2 12546 F 14 TABLE ACCESS BY INDEX ROWID WFWORKITEM 40 1799 426363 A 15 INDEX UNIQUE SCAN PK_WFWORKITEM 815 1 16 TABLE ACCESS FULL WS_CIRCUITSHEET_TASK 2 409 153784 F 17 TABLE ACCESS BY INDEX ROWID WS_CIRCUITSHEET_MAIN 1 1 26 A 18 INDEX UNIQUE SCAN WS_PK_CIRCUITSHEET_MAIN 102 19 NESTED LOOPS 44 1 723 AF 20 HASH JOIN 43 1 697 21 TABLE ACCESS BY INDEX ROWID WFWORKITEM 40 1799 426363 22 NESTED LOOPS 40 1 321 23 INLIST ITERATOR F 24 TABLE ACCESS BY INDEX ROWID WFWIPARTICIPANT 6 314 26376 A 25 INDEX RANGE SCAN WF_IDX_PART_TYPE 2 12546 26 BITMAP CONVERSION TO ROWIDS 27 BITMAP AND 28 BITMAP CONVERSION FROM ROWIDS A 29 INDEX RANGE SCAN PK_WFWORKITEM 1 30 BITMAP CONVERSION FROM ROWIDS A 31 INDEX RANGE SCAN IDX_WFWI_STATE 1 1 <<<<<<<<< 32 TABLE ACCESS FULL WS_CIRCUITSHEET_TASK 2 409 153784 F 33 TABLE ACCESS BY INDEX ROWID WS_CIRCUITSHEET_MAIN 1 1 26 A 34 INDEX UNIQUE SCAN WS_PK_CIRCUITSHEET_MAIN 102 35 CONCATENATION 36 NESTED LOOPS 141 1 786 AF 37 HASH JOIN 140 1 760 38 NESTED LOOPS 40 1 321 39 INLIST ITERATOR F 40 TABLE ACCESS BY INDEX ROWID WFWIPARTICIPANT 6 314 26376 A 41 INDEX RANGE SCAN WF_IDX_PART_TYPE 2 12546 F 42 TABLE ACCESS BY INDEX ROWID WFWORKITEM 40 10 2370 A 43 INDEX UNIQUE SCAN PK_WFWORKITEM 815 1 44 TABLE ACCESS FULL WS_CUTMANAGESHEET_TASK 86 71389 31339771 F 45 TABLE ACCESS BY INDEX ROWID WS_CUTMANAGESHEET_MAIN 1 1 26 A 46 INDEX UNIQUE SCAN WS_PK_CUTMANAGESHEET_MAIN 100 47 NESTED LOOPS 141 1 786 AF 48 HASH JOIN 140 1 760 49 TABLE ACCESS BY INDEX ROWID WFWORKITEM 40 10 2370 50 NESTED LOOPS 40 1 321 51 INLIST ITERATOR F 52 TABLE ACCESS BY INDEX ROWID WFWIPARTICIPANT 6 314 26376 A 53 INDEX RANGE SCAN WF_IDX_PART_TYPE 2 12546 54 BITMAP CONVERSION TO ROWIDS 55 BITMAP AND 56 BITMAP CONVERSION FROM ROWIDS A 57 INDEX RANGE SCAN PK_WFWORKITEM 1 58 BITMAP CONVERSION FROM ROWIDS A 59 INDEX RANGE SCAN IDX_WFWI_STATE 1 1 60 TABLE ACCESS FULL WS_CUTMANAGESHEET_TASK 86 71389 31339771 F 61 TABLE ACCESS BY INDEX ROWID WS_CUTMANAGESHEET_MAIN 1 1 26 A 62 INDEX UNIQUE SCAN WS_PK_CUTMANAGESHEET_MAIN 100 ... ... A 837 INDEX RANGE SCAN IX_HNRESONE_T_WORKID 1 1 F 838 TABLE ACCESS BY INDEX ROWID WS_HNRESONE_MAIN 1 53 1378 A 839 INDEX UNIQUE SCAN WS_PK_HNRESONE_MAIN 100 Ch Op ld ID Predicate Information (identified by operation id): -- ------ ---------------------------------------------------------------------------------------------------- 0 9 - access("WORKITEM_ID"="WORKITEMID") 9 - filter("CURRENTSTATE"=4 OR "CURRENTSTATE"=10 OR "PROLONG_READ_FLAG">0 OR "PHASE_BACK_READ_FLAG">0) 12 - filter("PARTIINTYPE"='EXE' OR "PARTIINTYPE"='GET') 13 - access("PARTICIPANTTYPE"='role' AND ("PARTICIPANT"='8a9580bc22fe4d430123037a68e208fd' OR "PARTICIPANT"='8a9580bc29781f21012981a824492b13')) 14 - filter("CURRENTSTATE"="WORKITEMSTATE") 15 - access("WORKITEMID"="WORKITEMID") 17 - filter("STATUS"=2) 18 - access("MAIN_ID"="MAIN_ID") 20 - access("WORKITEM_ID"="WORKITEMID") 20 - filter("CURRENTSTATE"=4 OR "CURRENTSTATE"=10 OR "PROLONG_READ_FLAG">0 OR "PHASE_BACK_READ_FLAG">0) 24 - filter(("PARTIINTYPE"='EXE' OR "PARTIINTYPE"='GET') AND (LNNVL("PARTICIPANTTYPE"='role') OR LNNVL("PARTICIPANT"='8a9580bc22fe4d430123037a68e208fd') AND LNNVL("PARTICIPANT"='8a9580bc29781f21012981a824492b13'))) 25 - access(("PARTICIPANTTYPE"='person' AND "PARTICIPANT"='wangqm' OR "PARTICIPANTTYPE"='organization' AND "PARTICIPANT"='1238')) 29 - access("WORKITEMID"="WORKITEMID") 31 - access("CURRENTSTATE"="WORKITEMSTATE") 33 - filter("STATUS"=2) 34 - access("MAIN_ID"="MAIN_ID") 37 - access("WORKITEM_ID"="WORKITEMID") 37 - filter("CURRENTSTATE"=4 OR "CURRENTSTATE"=10 OR "PROLONG_READ_FLAG">0 OR "PHASE_BACK_READ_FLAG">0) 40 - filter("PARTIINTYPE"='EXE' OR "PARTIINTYPE"='GET') 41 - access("PARTICIPANTTYPE"='role' AND ("PARTICIPANT"='8a9580bc22fe4d430123037a68e208fd' OR "PARTICIPANT"='8a9580bc29781f21012981a824492b13')) 42 - filter("CURRENTSTATE"="WORKITEMSTATE") 43 - access("WORKITEMID"="WORKITEMID") 45 - filter("STATUS"=2) 46 - access("MAIN_ID"="MAIN_ID") 48 - access("WORKITEM_ID"="WORKITEMID") 48 - filter("CURRENTSTATE"=4 OR "CURRENTSTATE"=10 OR "PROLONG_READ_FLAG">0 OR "PHASE_BACK_READ_FLAG">0) 52 - filter(("PARTIINTYPE"='EXE' OR "PARTIINTYPE"='GET') AND (LNNVL("PARTICIPANTTYPE"='role') OR LNNVL("PARTICIPANT"='8a9580bc22fe4d430123037a68e208fd') AND LNNVL("PARTICIPANT"='8a9580bc29781f21012981a824492b13'))) 53 - access(("PARTICIPANTTYPE"='person' AND "PARTICIPANT"='wangqm' OR "PARTICIPANTTYPE"='organization' AND "PARTICIPANT"='1238')) 57 - access("WORKITEMID"="WORKITEMID") 59 - access("CURRENTSTATE"="WORKITEMSTATE") 61 - filter("STATUS"=2) 62 - access("MAIN_ID"="MAIN_ID") ...
Note:
在使用到索引的位置使用的是nested-loop join, 而且多处出现,这样该语句自身就增加了buffer在访问该索引时,结果就是导致更高的latch CBC竞争,另外该索引是在BITMAP CONVERSION中使用,通常是要么是创建了不科学的索引,如在selectivity 较低的列创建了索引,导致使用了错误的执行计划,如改变两索引为组合索引解决,或通过_b_tree_bitmap_plans参数考虑在SQL,SESSION,SYSTEM level禁用普通tree索引转换bitmap的特性。
对不起,这篇文章暂时关闭评论。