Troubleshooting select ‘hang’/spin on 12C Active DataGuard
今天遇到一套oracle 12c RAC数据库,查询一个分区表的某一个分区,其实就1条记录,段大小也就4M,在primary查询秒出,但是在standby 查询十几分钟不出结果,注意性能问题hang和spin是两个类型,卡住不动是hang,死循环是spin, 通过v$session.seq也可以判断,当然也可以从call stack判断,下面记录一下这个问题。
SQL> select ACCOUNTID FROM "USR1"."TABLE_XX";
-- no feedback
SQL> @ase
USERNAME SID EVENT MACHINE MODULE STATUS et SQL_ID WAI_SECINW ROW_WAIT_OBJ# SQLTEXT BS CH# OSUSER SEQ# HEX
----------- ------- -------------------- ---------- -------------------- -------- --------- --------------- ---------- ------------- ------------------------------ ---------- ---- ---------- ------- ---------
PUBLIC 10645 Data Guard server op qdyye1 oracle ACTIVE 0 0:0 -1 : oracle 3947
PUBLIC 25042 Data Guard server op qdyye2 oracle ACTIVE 0 0:0 -1 : oracle 49019
SYS 13777 On CPU / runqueue anbob1 sqlplus ACTIVE 63 7jh8bdq9an4fg -1:63 -1 select ACCOUNTID FROM "USR1"." : 0 oracle 312 1000001
SQL> @snapper all 10 1 13777
Sampling SID 13777 with interval 10 seconds, taking 1 snapshots...
-- Session Snapper v4.14 BETA - by Tanel Poder ( http://blog.tanelpoder.com ) - Enjoy the Most Advanced Oracle Troubleshooting Script on the Planet! :)
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SID @INST, USERNAME , TYPE, STATISTIC , DELTA, HDELTA/SEC, %TIME, GRAPH , NUM_WAITS, WAITS/SEC, AVERAGES
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
13777 @1, SYS , STAT, session logical reads , 4877155, 366.35k, , , , , 6.5M total buffer visits
13777 @1, SYS , STAT, consistent gets , 4877128, 366.35k, , , , , ~ per execution
13777 @1, SYS , STAT, consistent gets from cache , 4877128, 366.35k, , , , , ~ per execution
13777 @1, SYS , STAT, consistent gets pin , 3251418, 244.23k, , , , , ~ per execution
13777 @1, SYS , STAT, consistent gets examination , 1625708, 122.12k, , , , , ~ per execution
13777 @1, SYS , STAT, logical read bytes from cache , 39953399808, 3G, , , , , ~ per execution
13777 @1, SYS , STAT, shared hash latch upgrades - no wait , 1625721, 122.12k, , , , , ~ per execution
13777 @1, SYS , STAT, calls to kcmgcs , 3251438, 244.24k, , , , , ~ per execution
13777 @1, SYS , STAT, buffer is pinned count , 1624527, 122.03k, , , , , 24.99 % buffer gets avoided thanks to buffer pin caching
-- End of Stats snap 1, end=2021-08-10 16:35:32, seconds=13.3
----------------------------------------------------------------------------------------------------
Active% | INST | SQL_ID | SQL_CHILD | EVENT | WAIT_CLASS
----------------------------------------------------------------------------------------------------
100% | 1 | 7jh8bdq9an4fg | 0 | ON CPU | ON CPU
-- End of ASH snap 1, end=2021-08-10 16:35:32, seconds=10, samples_taken=87
PL/SQL procedure successfully completed.
SQL> @seg USR1.TABLE_XX
SEG_MB OWNER SEGMENT_NAME SEG_PART_NAME SEGMENT_TYPE SEG_TABLESPACE_NAME BLOCKS HDRFIL HDRBLK
---------- -------------------- ------------------------------ ------------------------------ -------------------- ------------------------------ ---------- ---------- ----------
...
4 USR1 TABLE_XX PART_319 TABLE PARTITION DATA319 512 15 308873
4 USR1 TABLE_XX PART_335 TABLE PARTITION DATA335 512 16 308873
4 USR1 TABLE_XX PART_336 TABLE PARTITION DATA336 512 987 317065
执行计划是Full Table SCAN.
SQL> oradebug setorapid 262
Oracle pid: 262, Unix process pid: 244, image: oracle@anbob1 (TNS V1-V3)
SQL> oradebug short_stack;
ksedsts()+592<-ksdxfstk()+64<-ksdxcb()+2000<-sspuser()+640<-<-kcbgtcr()+37169<-kcbgcurstdby()+160<-$cold_kcbgcur()+624<-ktuGetUsegHdr()+2160<-ktuGetExtTxnInfo()+2192<-ktugti()+96<-$cold_ktuxcm()+1072<-ktspgfb_adg()+3936<-$cold_ktspGetNextL1ForScan()+1312<-ktspGenExtentMap1()+672<-kteinmap0()+4880<-kteinmap1()+48<-kteinmap()+48<-kdselini()+128<-kdsirsf()+7200<-kdsirs_fco()+352<-qertbFetch()+7744<-qergiFetch()+1168<-opifch2()+7648<-opifch()+112<-opiodr()+2176<-ttcpip()+1760<-opitsk()+3936<-opiino()+1712<-opiodr()+2176<-opidrv()+1824<-sou2o()+288<-opimai_real()+608<-ssthrdmain()+864<-main()+336<-main_opd_entry()+80 SQL> oradebug short_stack;
ksedsts()+592<-ksdxfstk()+64<-ksdxcb()+2000<-sspuser()+640<-<-kteinscb()<-kcbgtcr()+30624<-ktspgfb_adg()+3392<-$cold_ktspGetNextL1ForScan()+1312<-ktspGenExtentMap1()+672<-kteinmap0()+4880<-kteinmap1()+48<-kteinmap()+48<-kdselini()+128<-kdsirsf()+7200<-kdsirs_fco()+352<-qertbFetch()+7744<-qergiFetch()+1168<-opifch2()+7648<-opifch()+112<-opiodr()+2176<-ttcpip()+1760<-opitsk()+3936<-opiino()+1712<-opiodr()+2176<-opidrv()+1824<-sou2o()+288<-opimai_real()+608<-ssthrdmain()+864<-main()+336<-main_opd_entry()+80 SQL> oradebug short_stack;
ksedsts()+592<-ksdxfstk()+64<-ksdxcb()+2000<-sspuser()+640<-<-kcbgtcr()+33169<-ktspgfb_adg()+3392<-$cold_ktspGetNextL1ForScan()+1312<-ktspGenExtentMap1()+672<-kteinmap0()+4880<-kteinmap1()+48<-kteinmap()+48<-kdselini()+128<-kdsirsf()+7200<-kdsirs_fco()+352<-qertbFetch()+7744<-qergiFetch()+1168<-opifch2()+7648<-opifch()+112<-opiodr()+2176<-ttcpip()+1760<-opitsk()+3936<-opiino()+1712<-opiodr()+2176<-opidrv()+1824<-sou2o()+288<-opimai_real()+608<-ssthrdmain()+864<-main()+336<-main_opd_entry()+80 SQL> oradebug short_stack;
ksedsts()+592<-ksdxfstk()+64<-ksdxcb()+2000<-sspuser()+640<-<-kcscu8()<-$cold_kcbgtcr()+60160<-ktspgfb_adg()+2272<-$cold_ktspGetNextL1ForScan()+1312<-ktspGenExtentMap1()+672<-kteinmap0()+4880<-kteinmap1()+48<-kteinmap()+48<-kdselini()+128<-kdsirsf()+7200<-kdsirs_fco()+352<-qertbFetch()+7744<-qergiFetch()+1168<-opifch2()+7648<-opifch()+112<-opiodr()+2176<-ttcpip()+1760<-opitsk()+3936<-opiino()+1712<-opiodr()+2176<-opidrv()+1824<-sou2o()+288<-opimai_real()+608<-ssthrdmain()+864<-main()+336<-main_opd_entry()+80 SQL> oradebug short_stack;
ksedsts()+592<-ksdxfstk()+64<-ksdxcb()+2000<-sspuser()+640<-<-$cold_kcbgcur()+624<-ktuGetUsegHdr()+2160<-ktuGetExtTxnInfo()+2192<-ktugti()+96<-$cold_ktuxcm()+1072<-ktspgfb_adg()+3936<-$cold_ktspGetNextL1ForScan()+1312<-ktspGenExtentMap1()+672<-kteinmap0()+4880<-kteinmap1()+48<-kteinmap()+48<-kdselini()+128<-kdsirsf()+7200<-kdsirs_fco()+352<-qertbFetch()+7744<-qergiFetch()+1168<-opifch2()+7648<-opifch()+112<-opiodr()+2176<-ttcpip()+1760<-opitsk()+3936<-opiino()+1712<-opiodr()+2176<-opidrv()+1824<-sou2o()+288<-opimai_real()+608<-ssthrdmain()+864<-main()+336<-main_opd_entry()+80 SQL>
Parsed short stack:
(?)
kteinscb() kernel transaction extent management information operations [partial hit for: ktein ]
kcbgtcr()+30624 kernel cache buffers get consistent read
ktspgfb_adg()+3392<ktspGetNextL1ForScan()+1312 kernel transaction segment management segment pagetable [partial hit for: ktsp ]
ktspGenExtentMap1()+672 kernel transaction segment management segment pagetable [partial hit for: ktsp ]
kteinmap0()+4880 kernel transaction extent management information operations return extent map [partial hit for: kteinmap ]
kteinmap1()+48 kernel transaction extent management information operations return extent map [partial hit for: kteinmap ]
kteinmap()+48 kernel transaction extent management information operations return extent map
kdselini()+128 kernel data seek/scan extent list initialize
kdsirsf()+7200 kernel data seek/scan [partial hit for: kds ]
kdsirs_fco()+352 kernel data seek/scan [partial hit for: kds ]
qertbFetch()+7744 query execute rowsource table access fetch
select region FROM "USR1"."TABLE_XX" partition(part_336);
REGION
----------
336
SQL> @x
Display execution plan for last statement for this session from library cache...
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
SQL_ID 5yuc1gy6u09vf, child number 0
-------------------------------------
select region FROM "USR1"."TABLE_XX" partition(part_336)
Plan hash value: 798915741
------------------------------------------------------------------------------------------------------
| Id | Operation | Name | E-Rows |E-Bytes| Cost (%CPU)| Pstart| Pstop |
------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | | | 1 (100)| | |
| 1 | PARTITION RANGE SINGLE| | 1 | 4 | 1 (0)| 12 | 12 |
| 2 | INDEX FULL SCAN | IDX_UNI_TABLE_XX_OID | 1 | 4 | 1 (0)| 12 | 12 |
------------------------------------------------------------------------------------------------------
Note:
使用索引扫描可以很快出结果,但是使用表扫描一直处于spin, 表段很小4m, 但是查询每秒有366k的consistent gets ,从call stack看似乎一直在读取segmeng L1获取extent Map,在MOS中去查询比较匹配Bug 30325407
原因:
spin是由 L1BMB 块的意外状态引起的。要清除此状态,使用dbms_space.space_usage指定段分区即可,通常这个package也用于查看段碎片。
对不起,这篇文章暂时关闭评论。