Troubleshooting “DFS lock handle” wait event
a few days ago,My CRM DB face to a performance issue. DB is very slow. then I check DB wait event of active session ,Most sessions is waiting “DFS lock handel” , ‘DFS lock handle’ can cause massive performance issues in a busy RAC cluster. my db is 10R2 2NODE RAC ON AIX.
Riyaj Shamsudeen Said
”
DFS (stands for Distributed File System) is an ancient name, associated with cluster file system operations, in a Lock manager supplied by vendors in Oracle Parallel Server Environment (prior name for RAC). But, this wait event has morphed and is now associated with waits irrelevant to database files also. ”
How does it work?
I have no access to the code, so read this paragraph with caution, as I may have misunderstood my test results: A process trying to acquire a lock on a global GES resource sends a AST(Asynchronous Trap) or BAST (Blocking Asynchronous Trap) message to LCK process, constructing the message with (lock pointer, resource pointer, and resource name) information. If the resource is not available, then the LCK process sends a message to the lock holder for a lock downgrade.
Now, let’s imagine that we need to ask remote background process to do some operations, say, DBWR to do an object checkpoint in a remote instance. Further, the requestor must wait for the background process to complete the task. What better way is there to implement these waits, than using a lock? It seems that acquiring a lock on a specific resource in a specific mode, triggers a predefined action in that background process. For example, acquiring locks on a specific CI resource triggers the DBWR to object level checkpoint (details later).
While the background process is working, the requesting process is instrumented to wait on the wait event ‘DFS lock handle’, in essence, the requesting process is waiting for one or more background process to perform a specific action. BTW, in some cases, background processes also can wait for ‘DFS lock handle’. Excessive waits by the foreground or background process for the DFS lock handle wait event can lead to instance freeze or application freeze.
Every resource has a value block and can be used to send more information about the resource, such as object_id for a sequence, object_id for a table partition etc.
How to diag
During a period of wait, use the following SQL statement to retrieve the name of the lock and the mode of the lock request:
select chr(bitand(p1,-16777216)/16777215) || chr(bitand(p1, 16711680)/65535) “Lock”, to_char(bitand(p1, 65536)) “Mode”, p2, p3 , seconds_in_wait from v$session_wait where event = ‘DFS lock handle’
Then use the following to identify the requesting session and the blocking session:
select inst_id, sid, type, id1, id2, lmode, request, block from gv$lock where type=’CI’ and id1=9 and id2=5
In this example, use the values from the first query in the where clause. type=Lock, id1=p1 and id2=p2.
CASE 1
note: In this article I used Tanel Poder expert ‘s SQL scripts package
SQL> select * from v$version; BANNER ---------------------------------------------------------------- Oracle Database 10g Enterprise Edition Release 10.2.0.5.0 - 64bi PL/SQL Release 10.2.0.5.0 - Production CORE 10.2.0.5.0 Production TNS for HPUX: Version 10.2.0.5.0 - Production NLSRTL Version 10.2.0.5.0 - Production SQL> @a A-Script: Display active sessions... COUNT(*) SQL_ID STATE EVENT ---------- ------------- ------- ---------------------------------------------------------------- 88 9w5uw0jh30vtz WAITING DFS lock handle 63 a22wxf47tpjb1 WAITING DFS lock handle 49 c57w8qqawbhwy WAITING DFS lock handle 34 1yhkd9t3w21ub WAITING DFS lock handle 16 7wqbd0wkxb3p8 WAITING DFS lock handle 15 b88q4nv18t8zx WAITING DFS lock handle 9 3kzcw83n08t0g WAITING DFS lock handle 8 g22mvfwwbty7f WAITING DFS lock handle 7 2t3505ydschdc WAITING DFS lock handle 7 8d9pjq8by13nb WAITING DFS lock handle 7 WAITING log file sync 5 5buc1nvhfbzfb WAITING DFS lock handle 5 9hf9fgxvpxnhm WAITING DFS lock handle 5 9k04jggqzshs7 ON CPU On CPU / runqueue 4 b88q4nv18t8zx WAITING enq: TX - row lock contention 4 fsyu9fuanadtv WAITING DFS lock handle 4 5n6vwvrgtqh86 WAITING DFS lock handle SQL> @xi 9w5uw0jh30vtz % eXplain the execution plan for sqlid 9w5uw0jh30vtz child %... PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------- SQL_ID 9w5uw0jh30vtz, child number 0 INSERT INTO "CO_ORDER_ITEM_TEMP_T" ( "ORDER_ITEM_ID" , "CUST_ORDER_ID" , "ORDER_ITEM_CD" , "CUST_WORKSHEET_ID" , "STATUS_CD" , "STATUS_DATE" , "STATUS_CHANGE_REASON" , "PRIORITY" , "PRE_HANDLE_FLAG" , "HANDLE_TIME" , "ARCHIVE_DATE" , "FINISH_TIME" , "ORDER_ITEM_OBJ_ID" , "INSTALL_TIME_SEGMENT_ID" , "BEGIN_TIME_SEGMENT_ID" , "SERVICE_OFFER_ID" , "REASON" , "EXT_ORDER_ITEM_ID" , "LAN_ID" , "ACCEPT_CITY" , "CITY_CODE" , "EFFECT_TIME" , "ACCEPT_REASON" , "BOOK_DATE" , "BOOKING_OPEN_DATE", "ORDER_GROUP_ID" , "PARENT_GROUP_ID" , "STANDARD_OBJ_ID","ACCEPT_GROUP_ID", "ORDER_ITEM_TYPE","TIME_NAME","EXT_ORDER_ITEM_GROUP_ID" ) VALUES ( :1 , :2 , :3 , :4 , :5 , :6 , :7 , :8 , :9 , :10 , :11 , :12 , :13 , :14 , :15 , :16 , :17 , :18 , :19 , :20 , :2 SQL> @snapper ash 5 1 all Sampling SID all with interval 5 seconds, taking 1 snapshots... -- Session Snapper v4.11 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :) ---------------------------------------------------------------------------------------------------- Active% | INST | SQL_ID | SQL_CHILD | EVENT | WAIT_CLASS ---------------------------------------------------------------------------------------------------- 9600% | 1 | 9w5uw0jh30vtz | 0 | DFS lock handle | Other 5300% | 1 | gn1vcj4rx7b1t | 1 | ON CPU | ON CPU 5200% | 1 | a22wxf47tpjb1 | 0 | DFS lock handle | Other 5000% | 1 | c57w8qqawbhwy | 0 | DFS lock handle | Other 3000% | 1 | 9w5uw0jh30vtz | 0 | gc buffer busy | Cluster 3000% | 1 | | | latch: library cache | Concurrency 2900% | 1 | 9k04jggqzshs7 | 0 | ON CPU | ON CPU 2700% | 1 | 9hf9fgxvpxnhm | 1 | DFS lock handle | Other 2100% | 1 | 7wqbd0wkxb3p8 | 0 | DFS lock handle | Other 1700% | 1 | b88q4nv18t8zx | 2 | DFS lock handle | Other SQL> select username,machine,program,p1,p2,p3,p1text,p2text,sql_id,event from v$session where event='DFS lock handle'; USERNAME MACHINE PROGRAM P1 P2 P3 P1TEXT P2TEXT SQL_ID EVENT --------------- ------------ ----------------- ---------- ---------- ---------- ------------ --------- ------------- -------------------- MY_ANBOB_COM HOSXXXXXX JDBC Thin Client 1398145029 4252665 0 type|mode id1 a22wxf47tpjb1 DFS lock handle MY_ANBOB_COM HOSXXXXXX JDBC Thin Client 1398145029 4252665 0 type|mode id1 9w5uw0jh30vtz DFS lock handle MY_ANBOB_COM HOSXXXXXX JDBC Thin Client 1398145029 4252665 0 type|mode id1 7wqbd0wkxb3p8 DFS lock handle MY_ANBOB_COM HOSXXXXXX JDBC Thin Client 1398145029 4252665 0 type|mode id1 a22wxf47tpjb1 DFS lock handle MY_ANBOB_COM HOSXXXXXX JDBC Thin Client 1398145029 4252665 0 type|mode id1 c57w8qqawbhwy DFS lock handle MY_ANBOB_COM HOSXXXXXX JDBC Thin Client 1398145029 4252665 0 type|mode id1 9hf9fgxvpxnhm DFS lock handle MY_ANBOB_COM HOSXXXXXX JDBC Thin Client 1398145029 4252665 0 type|mode id1 3kzcw83n08t0g DFS lock handle MY_ANBOB_COM HOSXXXXXX JDBC Thin Client 1398145029 4252665 0 type|mode id1 9w5uw0jh30vtz DFS lock handle ... MY_ANBOB_COM HOSXXXXXX JDBC Thin Client 1398145029 4252665 0 type|mode id1 9w5uw0jh30vtz DFS lock handle MY_ANBOB_COM HOSXXXXXX JDBC Thin Client 1398145029 4252665 0 type|mode id1 9w5uw0jh30vtz DFS lock handle MY_ANBOB_COM HOSXXXXXX JDBC Thin Client 1398145029 4252665 0 type|mode id1 a22wxf47tpjb1 DFS lock handle
已选择472行。
# identify the type of lock associated with this event.
SQL> select chr(bitand(&&p1,-16777216)/16777215) || 2 chr(bitand(&&p1,16711680)/65535) type, 3 mod(&&p1, 16) md 4 from dual 5 ; 输入 p1 的值: 1398145029 TY MD -- ---------- SV 5 SQL> select * from v$lock_type where type='SV'; TYPE NAME ID1_TAG ID2_TAG IS_ DESCRIPTION ---------- --------------------- ------------- ------------ ------ ------------------ SV Sequence Ordering object # 0 NO Lock to ensure ordered sequenc
# identified by a resource name
SQL> @oid 4252665 owner object_name object_type SUBOBJECT_NAME CREATED LAST_DDL_TIME status DATA_OBJECT_ID ------------------------- ------------------------------ ------------------ ------------------------------ ----------------- ----------------- --------- -------------- WWW_ANBOB_COM SEQ_XX_VERSION_S SEQUENCE 20141204 01:31:51 20141210 20:06:56 VALID SQL> select * from dba_sequences where sequence_name='SEQ_XX_VERSION_S'; SEQUENCE_OWNER SEQUENCE_NAME MIN_VALUE MAX_VALUE INCREMENT_BY C O CACHE_SIZE LAST_NUMBER ------------------------------ ------------------------------ ---------- ---------- ------------ - - ---------- ----------- WWW_ANBOB_COM SEQ_XX_VERSION_S 1 1.0000E+14 1 N Y 20 168011961
NOTE:
the sequence “SEQ_XX_VERSION_S” cache size 20 and with ORDER attribute, if the sequences are accessed in different instances. To accommodate this requirement, RAC code must ensure that next value retrieved from the sequence will be in an ORDER in any instance and the co-ordination mechanism is by transferring couple of GES messages between the instances.
SQL> alter sequence WWW_ANBOB_COM .SEQ_XX_VERSION_S cache 1000 noorder; 序列已更改。
Tip:
then the solution probably would be that, ORDER attribute is not a suitable attribute for that heavily accessed sequence. My recommendation is always been (a) Use bigger cache >1000 for the sequences frequently accessed
SQL> @a A-Script: Display active sessions... COUNT(*) SQL_ID STATE EVENT ---------- ------------- ------- ---------------------------------------------------------------- 12 WAITING log file sync 11 5xhp3dsuv0zf8 WAITING gc buffer busy 8 8h67q133fayyj WAITING gc buffer busy 8 5sw4j8f1hk5xc WAITING gc buffer busy 7 dm6azg0mj90gp WAITING gc buffer busy ...
wait a moment, DFS lock handle wait event is gone. The database performance problems solved.
References Riyaj Shamsudeen ‘s article.
对不起,这篇文章暂时关闭评论。