Logon Storm and Memory shortage issue ORA-07445 dbgtrReserveSpace (记录一次登录风暴致内存不足的问题)
记录一次内存不足使用SWAP 严重时的数据库表现。 oracle 11203 on linux 64
cpu core 8,物理内存:8g,问题当时物理内存使用完,swap 已使用4G,有配置hugepage,且 vmstat 查看swap in out 严重。负载60左右
Check oracle alert
2013-10-19 08:51:10.384000 +08:00 Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x10] [PC:0x938E8C5, dbgtrReserveSpace()+99] [flags: 0x0, count: 1] Errors in file /oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_13864.trc (incident=60801): ORA-07445: exception encountered: core dump [dbgtrReserveSpace()+99] [SIGSEGV] [ADDR:0x10] [PC:0x938E8C5] [Address not mapped to object] [] Incident details in: /oracle/diag/rdbms/ora11g/ora11g/incident/incdir_60801/ora11g_ora_13864_i60801.trc Use ADRCI or Support Workbench to package the incident. See Note 411.1 at My Oracle Support for error and packaging details. 2013-10-19 08:52:14.986000 +08:00 Sweep [inc][60837]: completed Sweep [inc][60836]: completed Sweep [inc][60835]: completed Sweep [inc][60834]: completed Sweep [inc][60833]: completed ... Sweep [inc][60839]: completed Sweep [inc][60838]: completed 2013-10-19 08:53:24.174000 +08:00 Errors in file /oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_14250.trc (incident=47873): ORA-00600: internal error code, arguments: [17147], [0x15A2005D0], [], [], [], [], [], [], [], [], [], [] Incident details in: /oracle/diag/rdbms/ora11g/ora11g/incident/incdir_47873/ora11g_ora_14250_i47873.trc ... 2013-10-19 09:10:01.732000 +08:00 WARNING: Heavy swapping observed on system in last 5 mins. 2013-10-19 09:10:13.915000 +08:00 pct of memory swapped in [8.73%] pct of memory swapped out [4.84%]. Please make sure there is no memory pressure and the SGA and PGA are configured correctly. Look at DBRM trace file for more details.
Incident details
vi /oracle/diag/rdbms/ora11g/ora11g/incident/incdir_60801/ora11g_ora_13864_i60801.trc Dump continued from file: /oracle/diag/rdbms/ora11g/ora11g/trace/ora11g_ora_13864.trc ORA-07445: exception encountered: core dump [dbgtrReserveSpace()+99] [SIGSEGV] [ADDR:0x10] [PC:0x938E8C5] [Address not mapped to object] [] ========= Dump for incident 60801 (ORA 7445 [dbgtrReserveSpace()+99]) ======== ----- Beginning of Customized Incident Dump(s) ----- Exception [type: SIGSEGV, Address not mapped to object] [ADDR:0x10] [PC:0x938E8C5, dbgtrReserveSpace()+99] [flags: 0x0, count: 1] Registers: %rax: 0x00000000770000e4 %rbx: 0x0000000000000068 %rcx: 0x0000000000000068 %rdx: 0x00000001658f10c0 %rdi: 0x00002b25a44ed710 %rsi: 0x00007fff08087ae8 %rsp: 0x00007fff08087930 %rbp: 0x00007fff080879b0 %r8: 0x0000000000000068 %r9: 0x0000000000000001 %r10: 0x0000000000000068 %r11: 0x0000000000000000 %r12: 0x0000000000000000 %r13: 0x00007fff08087ae8 %r14: 0x00000001658f10c0 %r15: 0x0000000000000001 %rip: 0x000000000938e8c5 %efl: 0x0000000000010206 dbgtrReserveSpace()+83 (0x938e8b5) jnz 0x938f18c dbgtrReserveSpace()+89 (0x938e8bb) mov (%r14),%r12 dbgtrReserveSpace()+92 (0x938e8be) or $64,%eax dbgtrReserveSpace()+95 (0x938e8c1) mov %eax,0x8(%r14) > dbgtrReserveSpace()+99 (0x938e8c5) mov 0x10(%r12),%r8 dbgtrReserveSpace()+104 (0x938e8ca) mov 0x18(%r12),%r9 dbgtrReserveSpace()+109 (0x938e8cf) mov %r9,%rax dbgtrReserveSpace()+112 (0x938e8d2) sub %r8,%rax dbgtrReserveSpace()+115 (0x938e8d5) movzwl %ax,%eax ----- Call Stack Trace ----- skdstdst()+36 call kgdsdst() ksedst1()+98 call skdstdst() ksedst()+34 call ksedst1() dbkedDefDump()+2741 call ksedst() ksedmp()+36 call dbkedDefDump() ssexhd()+2458 call ksedmp() __sighandler() call ssexhd() dbgtrReserveSpace() signal __sighandler() +99 dbgtTrcData_int()+3 call dbgtrReserveSpace() 18 kstipg()+740 call dbgtTrcData_int() dbktbDefaultBucketA call kstipg() lloc()+87 ksucrp()+1470 call dbktbDefaultBucketA lloc() opiino()+1477 call ksucrp() opiodr()+916 call opiino() opidrv()+565 call opiodr() sou2o()+98 call opidrv() opimai_real()+128 call sou2o() ssthrdmain()+252 call opimai_real() main()+196
Analyze listener.ora log
[oracle@dbserver89 trace]$ fgrep "19-OCT-2013 08:" listener.log_bak|fgrep "establish"|awk '{print $1 "" $2}'| awk -F: '{print $1 ":" $2}' |sort|uniq -c ... 114 19-OCT-201308:40 83 19-OCT-201308:41 57 19-OCT-201308:42 13 19-OCT-201308:43 84 19-OCT-201308:44 93 19-OCT-201308:45 95 19-OCT-201308:46 69 19-OCT-201308:47 71 19-OCT-201308:48 65 19-OCT-201308:49 52 19-OCT-201308:50 127 19-OCT-201308:51 39 19-OCT-201308:52 99 19-OCT-201308:53 39 19-OCT-201308:54 338 19-OCT-201308:55 785 19-OCT-201308:56 1229 19-OCT-201308:57 1122 19-OCT-201308:58 1190 19-OCT-201308:59 ...
[root@dbserver89 sa]# sar -f sa19 -s 08:00:00 -e 10:00:00 Linux 2.6.18-92.el5 (dbserver89) 10/19/2013 08:00:01 AM CPU %user %nice %system %iowait %steal %idle 08:10:01 AM all 0.13 0.00 0.09 0.11 0.00 99.67 08:20:02 AM all 0.17 0.00 0.11 0.15 0.00 99.58 08:30:01 AM all 0.12 0.00 0.09 0.25 0.00 99.54 08:40:01 AM all 0.25 0.00 0.13 0.54 0.00 99.09 08:50:01 AM all 0.54 0.00 0.17 0.84 0.00 98.45 09:00:01 AM all 0.84 0.00 0.40 0.66 0.00 98.11 09:10:53 AM all 0.80 0.00 8.02 42.91 0.00 48.27 09:20:05 AM all 0.50 0.00 8.24 32.31 0.00 58.95 09:30:06 AM all 0.69 0.00 5.29 27.21 0.00 66.81 09:40:02 AM all 0.59 0.00 7.75 27.87 0.00 63.79 09:50:01 AM all 0.33 0.00 0.31 1.40 0.00 97.96 Average: all 0.45 0.00 2.78 12.28 0.00 84.49
Logon Storm
A Logon storm is a sudden spike in the number of client connection requests. Logon storms can occur due to a variety of factors. They could be malicious like a DoS attack. Or they could occur due to administrative actions – such as a middle tier coming online.
The process of starting a database session has inherent CPU costs associated with it – from the creation of a new OS process to attaching to the SGA. Every system has a limit above which the spawning of new database sessions can starve existing sessions of CPU resources. This could result in many unexpected consequences including application timeouts, increased response times, un-responsiveness and other cascading effects.
同时oracle 11g 默认是启用了 db audit 审计。
如果查询那个时间段的事件可以用下面的方法
SELECT snap_id, sample_time, session_type, session_id,sql_opname, sql_id, event, p1, p2, blocking_session, blocking_hangchain_info, current_obj#, machine FROM DBA_HIST_ACTIVE_SESS_HISTORY WHERE sample_time BETWEEN TO_DATE ('2013-10-19 08:00', 'yyyy-mm-dd hh24:mi') AND TO_DATE ('2013-10-19 10:00', 'yyyy-mm-dd hh24:mi') ORDER BY 1,
会发现很多buffer busy wait 对象是aud$. 同时伴有resmgr:cpu quantum
对不起,这篇文章暂时关闭评论。