oracle high load案例及AWR分析过程
前两天有个数据库服务器的load 报警,提示负载过高
下面全面的分析一下这个系统的问题,仅供参考,如果有新的发现可以mail给我
e.g.
当ssh连接上时就已慢的要死 TOP 命令查看 top - 11:59:55 up 660 days, 20:35, 2 users, load average: 228.52, 231.29, 206.63 Tasks: 711 total, 4 running, 707 sleeping, 0 stopped, 0 zombie Cpu0 : 16.2%us, 1.0%sy, 0.0%ni, 20.1%id, 62.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu1 : 11.6%us, 0.3%sy, 0.0%ni, 26.9%id, 61.1%wa, 0.0%hi, 0.0%si, 0.0%st Cpu2 : 16.4%us, 0.7%sy, 0.0%ni, 19.4%id, 63.5%wa, 0.0%hi, 0.0%si, 0.0%st Cpu3 : 15.2%us, 2.6%sy, 0.0%ni, 30.7%id, 51.5%wa, 0.0%hi, 0.0%si, 0.0%st Cpu4 : 23.4%us, 1.0%sy, 0.0%ni, 4.3%id, 71.3%wa, 0.0%hi, 0.0%si, 0.0%st Cpu5 : 13.5%us, 0.7%sy, 0.0%ni, 37.2%id, 48.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu6 : 17.8%us, 0.7%sy, 0.0%ni, 10.9%id, 70.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu7 : 13.9%us, 0.7%sy, 0.0%ni, 22.8%id, 62.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu8 : 17.2%us, 1.0%sy, 0.0%ni, 17.2%id, 64.6%wa, 0.0%hi, 0.0%si, 0.0%st Cpu9 : 13.5%us, 0.7%sy, 0.0%ni, 20.4%id, 65.5%wa, 0.0%hi, 0.0%si, 0.0%st Cpu10 : 15.5%us, 0.7%sy, 0.0%ni, 22.0%id, 61.8%wa, 0.0%hi, 0.0%si, 0.0%st Cpu11 : 13.5%us, 0.7%sy, 0.0%ni, 28.7%id, 57.1%wa, 0.0%hi, 0.0%si, 0.0%st Cpu12 : 33.1%us, 2.0%sy, 0.0%ni, 1.0%id, 62.9%wa, 0.0%hi, 1.0%si, 0.0%st Cpu13 : 13.5%us, 0.3%sy, 0.0%ni, 25.4%id, 60.7%wa, 0.0%hi, 0.0%si, 0.0%st Cpu14 : 16.2%us, 0.7%sy, 0.0%ni, 6.3%id, 76.8%wa, 0.0%hi, 0.0%si, 0.0%st Cpu15 : 13.9%us, 0.3%sy, 0.0%ni, 13.9%id, 71.9%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16430068k total, 16376792k used, 53276k free, 6264k buffers Swap: 16386292k total, 1508260k used, 14878032k free, 6685768k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 14499 oracle 16 0 7207m 1.6g 1.6g D 7 10.0 0:45.25 oracle 11771 oracle 16 0 7208m 2.3g 2.3g D 6 14.7 1:15.00 oracle 14652 oracle 16 0 7207m 1.6g 1.5g D 6 9.9 0:46.10 oracle 12230 oracle 18 0 7206m 2.2g 2.2g D 6 14.2 1:13.13 oracle 14591 oracle 17 0 7203m 558m 553m D 4 3.5 0:03.69 oracle
Note:
发现16 core CPU在过去的10分钟内都为200以上,完全是在超负荷运行,正常范围应该是load avg<= number of CPU core, RES --The non-swapped physical memory a task has used.每个进程都占用了相当高的内存相对与一个当前系统数据规模
[oracle@xnh ~]$ vmstat 2 6 procs ———–memory———- —swap– —–io—- –system– —–cpu—— r b swpd free buff cache si so bi bo in cs us sy id wa st 6 24 1507332 53740 6352 6681248 0 0 12 11 0 0 2 0 98 0 0 0 28 1507280 51368 6304 6680536 92 0 18740 7126 1816 2973 17 1 20 62 0 0 38 1507172 54276 6256 6676488 256 0 14176 7624 1721 2634 13 1 19 67 0 0 19 1507044 52508 6364 6676612 244 0 11476 7328 1816 2694 12 1 26 62 0 0 74 1506936 51188 6300 6679264 236 0 13198 9266 1749 2967 15 1 12 72 0 0 25 1506736 54848 6196 6673316 456 0 16088 7772 1769 3040 13 1 12 75 0
Note:
bi: Blocks received from a block device (blocks/s).
bo: Blocks sent to a block device (blocks/s).
看出磁盘I/O的读写很是频繁
[oracle@xnh ~]$ sar -u Linux 2.6.18-92.el5 (xnh) 04/20/2012 12:00:01 AM CPU %user %nice %system %iowait %steal %idle 12:10:01 AM all 0.05 0.00 0.04 0.01 0.00 99.91 12:20:01 AM all 0.04 0.00 0.03 0.00 0.00 99.93 12:30:01 AM all 0.04 0.00 0.03 0.00 0.00 99.93 12:40:01 AM all 0.04 0.00 0.03 0.00 0.00 99.93 12:50:01 AM all 0.04 0.00 0.04 0.00 0.00 99.92 01:00:01 AM all 0.03 0.00 0.03 0.00 0.00 99.94 01:10:01 AM all 0.05 0.00 0.04 0.00 0.00 99.91 01:20:01 AM all 0.04 0.00 0.04 0.00 0.00 99.93 01:30:01 AM all 0.03 0.00 0.03 0.00 0.00 99.94 ..... 08:00:01 AM all 0.47 0.00 0.05 0.01 0.00 99.46 08:10:01 AM all 1.08 0.00 0.10 0.05 0.00 98.77 08:20:01 AM all 4.73 0.00 0.32 0.48 0.00 94.48 08:30:01 AM all 3.82 0.00 0.28 0.41 0.00 95.49 08:40:03 AM all 11.95 0.00 0.70 7.79 0.00 79.55 08:50:01 AM all 6.70 0.00 0.38 2.77 0.00 90.15 09:00:01 AM all 5.28 0.00 0.27 0.06 0.00 94.39 09:10:01 AM all 3.44 0.00 0.18 0.06 0.00 96.32 09:20:02 AM all 10.42 0.00 0.48 3.29 0.00 85.80 09:30:02 AM all 17.17 0.00 0.73 11.13 0.00 70.97 09:40:02 AM all 18.83 0.00 0.63 4.24 0.00 76.30 09:50:02 AM all 8.26 0.00 0.49 2.49 0.00 88.77 10:00:01 AM all 24.67 0.00 0.97 11.45 0.00 62.92 10:10:06 AM all 15.58 0.00 0.92 35.79 0.00 47.70 10:20:01 AM all 32.48 0.00 0.96 16.03 0.00 50.53 10:30:01 AM all 5.07 0.00 0.40 1.00 0.00 93.53 10:40:03 AM all 14.05 0.00 0.83 17.73 0.00 67.38 10:50:14 AM all 13.45 0.00 0.78 44.29 0.00 41.48 11:00:14 AM all 11.97 0.00 0.84 71.30 0.00 15.89 11:10:15 AM all 12.96 0.00 0.76 74.02 0.00 12.26 11:20:18 AM all 10.87 0.00 0.74 66.45 0.00 21.94 11:30:20 AM all 10.09 0.00 1.12 86.69 0.00 2.10 11:40:15 AM all 14.47 0.00 2.67 68.64 0.00 14.23 11:50:15 AM all 11.83 0.00 1.96 69.83 0.00 16.38 12:00:08 PM all 15.22 0.00 0.75 67.64 0.00 16.39 Average: all 4.29 0.00 0.33 9.25 0.00 86.13
Note:
看出从11点到12点IOWAIT一直在70%左右,磁盘的读写性能满足不了当前的需求
[oracle@xnh ~]$ iostat 2 4 Linux 2.6.18-92.el5 (xnh) 04/20/2012 avg-cpu: %user %nice %system %iowait %steal %idle 1.58 0.00 0.09 0.04 0.00 98.28 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 8.25 390.43 338.25 22293166143 19313501556 avg-cpu: %user %nice %system %iowait %steal %idle 12.41 0.00 0.62 66.47 0.00 20.50 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 441.00 50904.00 8136.00 101808 16272 avg-cpu: %user %nice %system %iowait %steal %idle 10.43 0.00 0.75 69.29 0.00 19.53 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 464.00 53764.00 6288.00 107528 12576 avg-cpu: %user %nice %system %iowait %steal %idle 11.15 0.00 0.75 65.02 0.00 23.08 Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn sda 504.00 57120.00 8284.00 114240 16568
note:
Blk_read/s
Indicate the amount of data read from the device expressed in a number of blocks per second. Blocks are equivalent to sectors with 2.4 kernels and newer and therefore have a size of 512 bytes. With older ker-nels, a block is of indeterminate size.
上面看最高时每秒读 27M (Blk_read/s*512byte)
AWR REPORT
WORKLOAD REPOSITORY report for DB Name DB Id Instance Inst Num Release RAC Host ------------ ----------- ------------ -------- ----------- --- ------------ XNH 4238368451 xnh 1 10.2.0.4.0 NO xnh Snap Id Snap Time Sessions Curs/Sess --------- ------------------- -------- --------- Begin Snap: 18125 20-Apr-12 11:01:00 337 1.0 End Snap: 18126 20-Apr-12 12:00:28 490 1.0 Elapsed: 59.46 (mins) DB Time: 12,979.44 (mins) Note: Load =db time/(elapsed* num_cpus)*100% 12,979.44/(59.46*16)%100% =1363.9% Db time/elapsed =219.99 top 命令的load average: 228.52不是很接近么? Cache Sizes ~~~~~~~~~~~ Begin End ---------- ---------- Buffer Cache: 6,000M 6,000M Std Block Size: 8K Shared Pool Size: 800M 800M Log Buffer: 14,336K Load Profile ~~~~~~~~~~~~ Per Second Per Transaction --------------- --------------- Redo size: 8,742.72 22,421.60 Logical reads: 31,639.22 81,142.02 Block changes: 24.32 62.37 Physical reads: 2,726.19 6,991.58 Physical writes: 736.93 1,889.93 User calls: 100.87 258.69 Parses: 32.44 83.19 Hard parses: 0.34 0.86 Sorts: 6.05 15.51 Logons: 0.28 0.71 Executes: 33.35 85.53 Transactions: 0.39 % Blocks changed per Read: 0.08 Recursive Call %: 25.63 Rollback per transaction %: 0.72 Rows per Sort: 1483.30 Note: physical reads 21.3MB/sec 54.6MB/trans database has relatively high logical I/O at 31,639 reads per second. Logical Reads includes data block reads from both memory and disk. High LIO is sometimes associated with high CPU activity. CPU bottlenecks occur when the CPU run queue exceeds the number of CPUs on the database server, and this can be seen by looking at the "r" column in the vmstat UNIX/Linux utility or within the Windows performance manager. Consider tuning your application to reduce unnecessary data buffer touches (SQL Tuning or PL/SQL bulking), using faster CPUs or adding more CPUs to your system. Instance Efficiency Percentages (Target 100%) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Buffer Nowait %: 98.34 Redo NoWait %: 100.00 Buffer Hit %: 93.04 In-memory Sort %: 99.58 Library Hit %: 98.19 Soft Parse %: 98.96 Execute to Parse %: 2.73 Latch Hit %: 97.76 Parse CPU to Parse Elapsd %: 0.42 % Non-Parse CPU: 99.73 Shared Pool Statistics Begin End ------ ------ Memory Usage %: 90.32 91.51 % SQL with executions>1: 86.92 93.17 % Memory for SQL w/exec>1: 84.56 91.41 note: non-reusable SQL 占用了共享池 91.51% 内存区. Oracle shared pool 包含 Oracle´s library cache,它负责收集、分析、解析、执行数据库, 我们在oracle10g可以检查 dba_hist_librarycache 表查看 library cache 历史使用情况.我 Shared Pool 内在用了 91.51% and "% Memory for SQL w/exec>1" is 91.41%. 如果服务器允许建议增加 shared_pool_size 参数增大share pool size. Top 5 Timed Events Avg %Total ~~~~~~~~~~~~~~~~~~ wait Call Event Waits Time (s) (ms) Time Wait Class ------------------------------ ------------ ----------- ------ ------ ---------- read by other session 1,878,857 221,360 118 28.4 User I/O db file scattered read 788,260 40,703 52 5.2 User I/O CPU time 7,968 1.0 db file sequential read 171,046 6,626 39 0.9 User I/O latch: library cache 1,233 3,091 2507 0.4 Concurrenc ------------------------------------------------------------- Note: read by other session 是10g后新分出来的wait事件,以前规为buffer busy waits,发生在当一个session会话正在读取xx block从磁盘到buffer cache内存区时,另一些session话同样也要加载这些block到内存中,类似hot block, 如果是index range scan 会产生db file sequential read ,如果是full table scan 或index fast full scan会产生db file scattered read所以这个事件往往陪伴着read by other session,所以不是IO负载太重就是IO性能太差,总之就是数据库存在读竞争 原因应该是应用设计问题,多次同时大量的访问相同的数据 我先查一下有没有bitmap index或没有唯一索引的外键,确认不存在 下面看一下read by other session官方定义 Definition: When information is requested from the database, Oracle will first read the data from disk into the database buffer cache. If two or more sessions request the same information, the first session will read the data into the buffer cache while other sessions wait. In previous versions this wait was classified under the “buffer busy waits” event. However, in Oracle 10.1 and higher this wait time is now broken out into the “read by other session” wait event. Excessive waits for this event are typically due to several processes repeatedly reading the same blocks, e.g. many sessions scanning the same index or performing full-table scans on the same table. Tuning this issue is a matter of finding and eliminating this contention. Confio concludes with a summary that “read by other session waits” are very similar to buffer busy waits When a session waits on the “read by other session” event, it indicates a wait for another session to read the data from disk into the Oracle buffer cache. If this happens too often the performance of the query or the entire database can suffer. Typically this is caused by contention for “hot” blocks or objects so it is imperative to find out which data is being contended for. Once that is known this document lists several alternative methods for solving the issue. buffer busy wait剔除了read by other session原因之后的另外一种情况,the buffer is the buffer cache, but in a incompatible mode (that is, some other session is changing the buffer),这种情况则更应该归结到系统设计上的原因,多个语句同时操作同样的块 Wait Events DB/Inst: XNH/xnh Snaps: 18125-18126 ->s - second -> cs - centisecond - 100th of a second -> ms - millisecond - 1000th of a second ->us - microsecond - 1000000th of a second -> ordered by wait time desc, waits desc (idle events last) Avg %Time Total Wait wait Waits Event Waits -outs Time (s) (ms) /txn ---------------------------- -------------- ------ ----------- ------- --------- read by other session 1,878,857 .1 221,360 118 1,350.7 db file scattered read 788,260 .0 40,703 52 566.7 db file sequential read 171,046 .0 6,626 39 123.0 latch: library cache 1,233 .0 3,091 2507 0.9 log file sync 3,595 40.6 2,178 606 2.6 latch: shared pool 1,312 .0 2,100 1601 0.9 log file parallel write 2,203 .0 1,201 545 1.6 control file parallel write 1,062 .0 1,013 954 0.8 latch: session allocation 223 .0 796 3569 0.2 db file scattered read 平均等待 52 milliseconds,非常低. scattered read events 是发生在数据库读表的多个block(FTS)或者 索引(IFFS). 全表扫描检查是否缺少index, 并且确认是否优化了db_file_multiblock_read_count参数. 如果以上优化失败那就建议把数据文件放到更快的物理磁盘中如固态硬盘 db file sequential read 平均 39 milliseconds也非常低 ,这个事件发生在读取表或索引的单位个block. 因此可以考虑用non-buffered direct I/O. 发现有非常高的 log file sync waits 2.6 每个事务,检查是否有程序在频繁提交, 移动 redo log files 写入块的设备.也可以增加 log_buffer 大小. Instance Activity Stats DB/Inst: XNH/xnh Snaps: 18125-18126 Statistic Total per Second per Trans -------------------------------- ------------------ -------------- ------------- CPU used by this session 352,671 98.9 253.5 CPU used when call started 352,075 98.7 253.1 consistent gets 112,783,057 31,615.3 81,080.6 consistent gets - examination 21,534,665 6,036.6 15,481.4 physical write total bytes 21,642,505,728 6,066,809.5 15,558,954.5 physical write total multi block 254,612 71.4 183.0 physical writes 2,628,898 736.9 1,889.9 sorts (disk) 91 0.0 0.1 sorts (memory) 21,477 6.0 15.4 sorts (rows) 31,991,849 8,967.9 22,999.2 table fetch by rowid 36,834,231 10,325.3 26,480.4 table fetch continued row 215,306 60.4 154.8 table scans (long tables) 413 0.1 0.3 table scans (rowid ranges) 0 0.0 0.0 table scans (short tables) 37,612 10.5 27.0 consistent gets examination 6,036.6 per second. "Consistent gets - examination" is different than regular consistent gets. It is used to read undo blocks for consistent read purposes, but also for the first part of an index read and hash cluster I/O. To reduce logical I/O, you may consider moving your indexes to a large blocksize tablespace. Because index splitting and spawning are controlled at the block level, a larger blocksize will result in a flatter index tree structure. high disk write activity with 736.9 per second. Also investigate moving your high-write datafiles to a smaller data buffer to improve the speed of the database writer process. have 91 disk sorts during this period. Disk sorts are very expensive and increasing your PGA (sort_area_size or pga_aggregate_target) may allow you to perform these sorts in RAM. have 215,306 table fetch continued row actions during this period. Migrated/chained rows always cause double the I/O for a row fetch and "table fetch continued row" (chained row fetch) happens when we fetch BLOB/CLOB columns (if the avg_row_len > db_block_size), when we have tables with > 255 columns, and when PCTFREE is too small. You may need to reorganize the affected tables with the dbms_redefintion utility and re-set your PCTFREE parameters to prevent future row chaining. have high small table full-table scans, at 10.5 per second. Verify that your KEEP pool is sized properly to cache frequently referenced tables and indexes.increase the speed of small-table full-table scans. Latch Activity DB/Inst: XNH/xnh Snaps: 18125-18126 -> "Get Requests", "Pct Get Miss" and "Avg Slps/Miss" are statistics for willing-to-wait latch get requests -> "NoWait Requests", "Pct NoWait Miss" are for no-wait latch get requests -> "Pct Misses" for both should be very close to 0.0 Pct Avg Wait Pct Get Get Slps Time NoWait NoWait Latch Name Requests Miss /Miss (s) Requests Miss ------------------------ -------------- ------ ------ ------ ------------ ------ cache buffer handles 30,813 3.8 0.0 1 0 N/A cache buffers chains 213,408,891 2.6 0.0 51 22,360,756 0.0 cache buffers lru chain 20,329 0.3 0.1 0 58,559,893 0.3 cache table scan latch 0 N/A N/A 0 789,489 0.1 library cache 312,565 0.5 0.7 3091 1,537 3.4 library cache load lock 78 0.0 N/A 0 0 N/A library cache lock 216,512 0.1 0.1 2 0 N/A row cache objects 650,726 0.3 0.0 4 2 0.0 have high cache buffer chain latches with 2.6% get miss. See MetaLink about increasing the hidden parameter _db_block_hash_buckets. have high library cache waits with 0.5% get miss. Consider pinning your frequently-used packages in the library cache with dbms_shared_pool.keep. have high row cache object latches with 0.3% get miss. The row cache objects is a serialization latch that protects the access to the data dictionary cache in the SGA. When loading, referencing and freeing objects in the data dictionary cache you will need to get this latch. increase the shared pool for a library cache latch problem well before this one is a problem. Buffer Pool Advisory DB/Inst: XNH/xnh Snap: 18126 -> Only rows with estimated physical reads >0 are displayed -> ordered by Block Size, Buffers For Estimate Est Phys Size for Size Buffers for Read Estimated P Est (M) Factor Estimate Factor Physical Reads --- -------- ------ ---------------- ------ ------------------ D 592 .1 73,223 1.3 46,090,393,895 D 1,184 .2 146,446 1.1 37,573,907,683 D 1,776 .3 219,669 1.0 35,503,304,710 D 2,368 .4 292,892 1.0 34,848,764,580 D 2,960 .5 366,115 1.0 34,617,687,453 D 3,552 .6 439,338 1.0 34,551,237,866 D 4,144 .7 512,561 1.0 34,533,179,892 D 4,736 .8 585,784 1.0 34,523,985,494 D 5,328 .9 659,007 1.0 34,516,845,065 D 5,920 1.0 732,230 1.0 34,514,614,007 D 6,512 1.1 805,453 1.0 34,514,184,590 D 7,104 1.2 878,676 1.0 34,514,070,996 D 7,696 1.3 951,899 1.0 34,513,996,408 D 8,288 1.4 1,025,122 1.0 34,513,949,119 D 8,880 1.5 1,098,345 1.0 34,513,713,104 D 9,472 1.6 1,171,568 1.0 34,513,206,899 D 10,064 1.7 1,244,791 1.0 34,510,889,606 D 10,656 1.8 1,318,014 1.0 34,491,356,735 D 11,248 1.9 1,391,237 1.0 34,276,387,077 D 11,840 2.0 1,464,460 0.8 26,280,391,813 Note: buffer pool size 当前的5900M还是太小,当前的系统显然IO离oracle的建议12g还有很大并距,增加一倍db_cache_size可以减少23.86%的物理读, Segments by Logical Reads DB/Inst: XNH/xnh Snaps: 18125-18126 -> Total Logical Reads: 112,868,544 -> Captured Segments account for 97.7% of Total Tablespace Subobject Obj. Logical Owner Name Object Name Name Type Reads %Total ---------- ---------- -------------------- ---------- ----- ------------ ------- XNH_HUAXIA XNH XNH_INP_DETAIL TABLE 30,240,640 26.79 XNH_HUAXIA XNH XNH_MEMBER_INDEX_MEM INDEX 26,863,952 23.80 XNH_HUAXIA XNH XNH_OUTP_DETAIL TABLE 11,395,648 10.10 XNH_HUAXIA XNH XNH_MEMBER TABLE 8,090,016 7.17 XNH_LUSHAN XNH IDX_XNH_MEMBER_MEMBE INDEX 5,263,456 4.66 ------------------------------------------------------------- Segments by Physical Reads DB/Inst: XNH/xnh Snaps: 18125-18126 -> Total Physical Reads: 9,725,292 -> Captured Segments account for 79.4% of Total Tablespace Subobject Obj. Physical Owner Name Object Name Name Type Reads %Total ---------- ---------- -------------------- ---------- ----- ------------ ------- XNH_HUAXIA XNH XNH_INP_DETAIL TABLE 4,672,605 48.05 XNH_LUSHAN XNH XNH_INP_DETAIL TABLE 1,741,691 17.91 XNH_HUAXIA XNH XNH_OUTP_DETAIL TABLE 1,153,276 11.86 XNH_LUSHAN XNH XNH_OUTP_DETAIL TABLE 141,875 1.46 Segments by Buffer Busy Waits DB/Inst: XNH/xnh Snaps: 18125-18126 -> % of Capture shows % of Buffer Busy Waits for each top segment compared ->with total Buffer Busy Waits for all segments captured by the Snapshot Buffer Tablespace Subobject Obj. Busy % of Owner Name Object Name Name Type Waits Capture ---------- ---------- -------------------- ---------- ----- ------------ ------- SYS SYSTEM SEQ$ TABLE 1 25.00 XNH_HUAXIA XNH XNH_INP_DETAIL TABLE 1 25.00 XNH_HUAXIA XNH XNH_OUTP_DETAIL TABLE 1 25.00 XNH_LUSHAN XNH XNH_OUTP_DETAIL TABLE 1 25.00 Note: 对象被锁定在XNH_INP_DETAIL、XNH_OUTP_DETAIL、XNH_MEMBER_INDEX_MEM SQL ordered by Reads DB/Inst: XNH/xnh Snaps: 18125-18126 -> Total Disk Reads: 9,725,292 -> Captured SQL account for 103.2% of Total Reads CPU Elapsed Physical Reads Executions per Exec %Total Time (s) Time (s) SQL Id -------------- ----------- ------------- ------ -------- --------- ------------- 4,719,205 50 94,384.1 48.5 3284.75 ######### 6w71x1s7bct0c Module: JDBC Thin Client select i.hospitalid, i.HospitalName as hname, count(1)as zyrs, sum(LIVEDAYS)as zycrs, sum(p.yf)as ypfy, sum(p.crf)as crfy, sum(p.jcf)as jcfy, sum(p.zlf)as zlfy , sum(p.qt)as qtfy, sum(i.SUM_CHARGES)as fyze, sum(i.CANREIMBURSED)as kbje, sum( i.SUBVENTION)as bzje from XNH_INP_MASTER i left join XNH_FIXED_HOSPITAL f on f. 2,006,026 232 8,646.7 20.6 1383.35 ######### 9fdhw3atcp0xu Module: JDBC Thin Client select i.hospitalid, i.hospitalname, count(1)as mzrcs, sum(p.yf)as ypfy, sum(p. crf)as crfy, sum(p.jcf)as jcfy, sum(p.zlf)as zlfy, sum(p.qt)as qtfy, sum(i.SUM_C HARGES)as fyze, sum(i.SUBVENTION)as bzje, sum(i.OWN_AMOUNT)as zfje, sum(i.ACCOUN T_AMOUNT)as zhje from XNH_OUTP_MASTER i left join VIEW_OUTP_DETAIL_TOTAL p on p. note: 查看上面两条sql 的I/O physical read 6w71x1s7bct0C这条sql在一个小时内就发生了36G的物理读,每次执行产生737.3MB的物理读,而且差不多每分钟执行一次,占用了问read time 的48.5%,来源于JDBC Thin Client 应用连接,另一个9fdhw3atcp0xu 相对好一些,也占用了20.6% 如果可以优化上面两条sql,应该可以解决当前的性能问题 用dbms_xplan包查看执行计划 PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------------------------------------- SQL_ID 6w71x1s7bct0c -------------------- select i.hospitalid, i.HospitalName as hname, count(1)as zyrs, sum(LIVEDAYS)as zycrs, sum(p.yf)as ypfy, sum(p.crf)as crfy, sum(p.jcf)as jcfy, sum(p.zlf)as zlfy, sum(p.qt)as qtfy, sum(i.SUM_CHARGES)as fyze, sum(i.CANREIMBURSED)as kbje, sum(i.SUBVENTION)as bzje from XNH_INP_MASTER i left join XNH_FIXED_HOSPITAL f on f.HCODE=i.HOSPITALID left join VIEW_INP_DETAIL_TOTAL p on p.INPID=i.INPMASTERID where i.ICPCODE=:1 and i.STATE=2 and i.CRTIME between TO_TIMESTAMP(:2||' 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP(:3||'23:59:59', 'YYYY-MM-DD HH24:MI:SS') group by i.HOSPITALID,i.HospitalName order by i.HOSPITALID Plan hash value: 446142272 --------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | 116K(100)| | | 1 | FILTER | | | | | | | | 2 | SORT GROUP BY | | 1 | 149 | | 116K (2)| 00:23:16 | | 3 | HASH JOIN OUTER | | 1 | 149 | | 116K (2)| 00:23:16 | | 4 | NESTED LOOPS OUTER | | 1 | 79 | | 780 (2)| 00:00:10 | | 5 | TABLE ACCESS FULL | XNH_INP_MASTER | 1 | 69 | | 780 (2)| 00:00:10 | | 6 | INDEX UNIQUE SCAN | IDX_FIX_HOSPITAL_HCODE | 1 | 10 | | 0 (0)| | | 7 | VIEW | VIEW_INP_DETAIL_TOTAL | 79155 | 5410K| | 115K (2)| 00:23:07 | | 8 | HASH GROUP BY | | 79155 | 2318K| 636M| 115K (2)| 00:23:07 | | 9 | TABLE ACCESS FULL| XNH_INP_DETAIL | 14M| 429M| | 71186 (1)| 00:14:15 | --------------------------------------------------------------------------------------------------------- SQL_ID 6w71x1s7bct0c -------------------- select i.hospitalid, i.HospitalName as hname, count(1)as zyrs, sum(LIVEDAYS)as zycrs, sum(p.yf)as ypfy, sum(p.crf)as crfy, sum(p.jcf)as jcfy, sum(p.zlf)as zlfy, sum(p.qt)as qtfy, sum(i.SUM_CHARGES)as fyze, sum(i.CANREIMBURSED)as kbje, sum(i.SUBVENTION)as bzje from XNH_INP_MASTER i left join XNH_FIXED_HOSPITAL f on f.HCODE=i.HOSPITALID left join VIEW_INP_DETAIL_TOTAL p on p.INPID=i.INPMASTERID where i.ICPCODE=:1 and i.STATE=2 and i.CRTIME between TO_TIMESTAMP(:2||' 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP(:3||'23:59:59', 'YYYY-MM-DD HH24:MI:SS') group by i.HOSPITALID,i.HospitalName order by i.HOSPITALID Plan hash value: 3722179718 ---------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | 116K(100)| | | 1 | FILTER | | | | | | | | 2 | SORT GROUP BY | | 1 | 156 | | 116K (2)| 00:23:22 | | 3 | HASH JOIN OUTER | | 1 | 156 | | 116K (2)| 00:23:22 | | 4 | NESTED LOOPS OUTER | | 1 | 86 | | 308 (0)| 00:00:04 | | 5 | TABLE ACCESS BY INDEX ROWID| XNH_INP_MASTER | 1 | 76 | | 308 (0)| 00:00:04 | | 6 | INDEX RANGE SCAN | IDX_XNH_INP_MASTER_ICPCODE | 829 | | | 6 (0)| 00:00:01 | | 7 | INDEX UNIQUE SCAN | IDX_FIX_HOSPITAL_HCODE | 1 | 10 | | 0 (0)| | | 8 | VIEW | VIEW_INP_DETAIL_TOTAL | 92954 | 6354K| | 116K (2)| 00:23:18 | | 9 | HASH GROUP BY | | 92954 | 2723K| 645M| 116K (2)| 00:23:18 | | 10 | TABLE ACCESS FULL | XNH_INP_DETAIL | 15M| 434M| | 71471 (1)| 00:14:18 | ---------------------------------------------------------------------------------------------------------------------- 56 rows selected. PLAN_TABLE_OUTPUT --------------------------------------------------------------------------------------------------------------- SQL_ID 9fdhw3atcp0xu -------------------- select i.hospitalid, i.hospitalname, count(1)as mzrcs, sum(p.yf)as ypfy, sum(p.crf)as crfy, sum(p.jcf)as jcfy, sum(p.zlf)as zlfy, sum(p.qt)as qtfy, sum(i.SUM_CHARGES)as fyze, sum(i.SUBVENTION)as bzje, sum(i.OWN_AMOUNT)as zfje, sum(i.ACCOUNT_AMOUNT)as zhje from XNH_OUTP_MASTER i left join VIEW_OUTP_DETAIL_TOTAL p on p.REGID=i.REGID where i.ICPCODE=:1 and i.OPERATERTYPE='3' and i.STATE=2 and i.INPUTDT between TO_TIMESTAMP(:2||' 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP(:3||'23:59:59', 'YYYY-MM-DD HH24:MI:SS') group by i.HOSPITALID,i.hospitalname order by i.HOSPITALID,i.hospitalname Plan hash value: 3047430753 --------------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | --------------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | 26831 (100)| | | 1 | FILTER | | | | | | | | 2 | SORT GROUP BY | | 2085 | 413K| | 26831 (2)| 00:05:22 | | 3 | HASH JOIN OUTER | | 2085 | 413K| | 26830 (2)| 00:05:22 | | 4 | TABLE ACCESS BY INDEX ROWID | XNH_OUTP_MASTER | 2085 | 213K| | 2156 (1)| 00:00:26 | | 5 | BITMAP CONVERSION TO ROWIDS | | | | | | | | 6 | BITMAP AND | | | | | | | | 7 | BITMAP CONVERSION FROM ROWIDS| | | | | | | | 8 | INDEX RANGE SCAN | IDX_XNH_OUTP_MASTER_ICPCODE | | | | 91 (0)| 00:00:02 | | 9 | BITMAP CONVERSION FROM ROWIDS| | | | | | | | 10 | SORT ORDER BY | | | | 3160K| | | | 11 | INDEX RANGE SCAN | IDX_X_OUTPUT_MASTER_INPUTDT | | | | 454 (1)| 00:00:06 | | 12 | VIEW | VIEW_OUTP_DETAIL_TOTAL | 624K| 58M| | 24669 (2)| 00:04:57 | | 13 | HASH GROUP BY | | 624K| 23M| 186M| 24669 (2)| 00:04:57 | | 14 | TABLE ACCESS FULL | XNH_OUTP_DETAIL | 3122K| 116M| | 9418 (1)| 00:01:54 | --------------------------------------------------------------------------------------------------------------------------- SQL_ID 9fdhw3atcp0xu -------------------- select i.hospitalid, i.hospitalname, count(1)as mzrcs, sum(p.yf)as ypfy, sum(p.crf)as crfy, sum(p.jcf)as jcfy, sum(p.zlf)as zlfy, sum(p.qt)as qtfy, sum(i.SUM_CHARGES)as fyze, sum(i.SUBVENTION)as bzje, sum(i.OWN_AMOUNT)as zfje, sum(i.ACCOUNT_AMOUNT)as zhje from XNH_OUTP_MASTER i left join VIEW_OUTP_DETAIL_TOTAL p on p.REGID=i.REGID where i.ICPCODE=:1 and i.OPERATERTYPE='3' and i.STATE=2 and i.INPUTDT between TO_TIMESTAMP(:2||' 00:00:00', 'YYYY-MM-DD HH24:MI:SS') and TO_TIMESTAMP(:3||'23:59:59', 'YYYY-MM-DD HH24:MI:SS') group by i.HOSPITALID,i.hospitalname order by i.HOSPITALID,i.hospitalname Plan hash value: 3240792898 ---------------------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes |TempSpc| Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | | | | 8422(100)| | | 1 | FILTER | | | | | | | | 2 | SORT GROUP BY | | 2 | 404 | | 8422 (2)| 00:01:42 | | 3 | HASH JOIN OUTER | | 2 | 404 | | 8421 (2)| 00:01:42 | | 4 | TABLE ACCESS BY INDEX ROWID| XNH_OUTP_MASTER | 2 | 208 | | 7 (0)| 00:00:01 | | 5 | INDEX RANGE SCAN | IDX_XNH_OUTP_MASTER_ICPCODE | 24 | | | 3 (0)| 00:00:01 | | 6 | VIEW | VIEW_OUTP_DETAIL_TOTAL | 214K| 20M| | 8412 (2)| 00:01:41 | | 7 | HASH GROUP BY | | 214K| 8156K| 63M| 8412 (2)| 00:01:41 | | 8 | TABLE ACCESS FULL | XNH_OUTP_DETAIL | 1070K| 39M| | 3184 (1)| 00:00:39 | ---------------------------------------------------------------------------------------------------------------------- 56 rows selected.
就先会析到这,已把sql提交到了开发,后期观察。
参考了Don Burleson,Jamon Bowen等人的经验,thanks
对不起,这篇文章暂时关闭评论。