Oceanbase 存储空间使用率高统计分析方法

首页 » OceanBase » Oceanbase 存储空间使用率高统计分析方法

Oceanbase 存储空间使用率高统计分析方法

2024/09/17
OceanBase
51 views
Oceanbase 存储空间使用率高统计分析方法已关闭评论

在Oceanbase数据库日常运维中，像oracle一样数据库的存储空间当到在上限时会在日志或ocp中提示预警，可能磁盘空间物理大小限制或DB参数限制，分析空间不足的原因在分布式数据库OCEANBASE中相比ORACLE要复杂一些，如何查看当前的使用大小？是哪一类文件占用较大？如果是temp文件是DDL 还是SQL 查询产生的？有没有可能temp文件泄露没有释放？如何定位temp使用高？本篇仅记录一些方法

空间使用查询视图
oceanbase.__all_virtual_disk_stat
该内部表仅支持在 sys 租户下查询,记录每个 Server 的磁盘占用信息

字段说明

字段名称	类型	是否为 NULL	描述
svr_ip	varchar(46)	NO	IP 地址
svr_port	bigint(20)	NO	端口号
total_size	bigint(20)	NO	总大小，单位为字节
used_size	bigint(20)	NO	占用空间，单位为字节
free_size	bigint(20)	NO	空闲大小，单位为字节
is_disk_valid	bigint(20)	NO	是否有效 1：有效 0：无效，磁盘存在问题
disk_error_begin_ts	bigint(20)	NO	出错时间

# Log on to the OceanBase database of the OceanBase cluster as the root@sys user. 
obclient -h192.168.0.1 -P2881 -u -p_******_ -Doceanbase -A

# Query the disk capacity (total_size) and usage (used_size) of each OBServer in the OceanBase cluster. 
SELECT
 svr_ip,
 svr_port,
 ROUND(total_size / 1024 / 1024 / 1024, 2) AS TOTAL_GB,
 ROUND(free_size / 1024 / 1024 / 1024, 2) AS FREE_GB,
 ROUND((total_size - free_size)  / 1024 / 1024 / 1024, 2) AS USED_GB,
 ROUND((total_size - free_size) / total_size * 100, 2) AS USED_PCT
FROM
 oceanbase.__all_virtual_disk_stat order by 1;

| svr_ip         | svr_port | TOTAL | FREE  | USED  | USED_PCT |
+----------------+----------+-------+-------+-------+----------+
| 192.168.56.100 |     2882 | 9.99G | 9.84G | 0.15G | 1.52%    |
+----------------+----------+-------+-------+-------+----------+
1 row in set (0.198 sec)

报错警告日志
oceanbase/log/observer.log

生成时间：xxxx-xx-xx Txx:xx:xx+08:00  ob_cluster=dp_ob_cluster_01:host=xx.xx.xx.xx:server_type=observer:ob_error_code=4184 
详情：集群：dp_ob_cluster_01 主机：xx.xx.xx.xx
日志类型：observer 错误码：4184 TraceId：Y0-0000000000000000 日志名称：OB_CS_OUTOF_DISK_SPACE 日志级别：
error 日志详情：[xxxx-xx-xx xx:xx:xx.xxxxx] ERROR [STORAGE] check_disk_full 
(ob_store_file.cpp:1147) [37222][868][xxxxx-xxxxx] [lt=10] [dc=0] disk is almost full(ret=-4184, required_size=0, required_count=0,
 free_count=137037, used_percent=90, 
super_block_={header:{super_block_size:135, version:2, magic:1012, attr:0}, 
content:{create_timestamp:xxxxx, modify_timestamp:xxxxx, macro_block_size:2097152, total_macro_block_count:1250045, free_macro_block_count:137037, total_file_size:2621534371840, replay_start_point:ObLogCursor{file_id=106, log_id=48317862, offset=23781376}, macro_block_meta:{block_index:637952, log_seq:0, file_id:0, file_size:0}, partition_meta:{block_index:638566, log_seq:48127611, file_id:0, file_size:0}, table_mgr_meta:{block_index:638832, log_seq:48317860, file_id:0, file_size:0}, tenant_config_meta:{block_index:638840, log_seq:16233199, file_id:0, file_size:0}}}) BACKTRACE:0xcc24b9a 0xcb43f12 0x6fff1ee 0x6fffa89 0x6fe78f1 0x6fdba45 0x6fdbacb 0x80627be 0xcb7cc83 0x37df1f9 0xc9c3e87 0xx9c0d70 0xc9bfc0f

可能的原因
当 used_percent 大于 data_disk_usage_limit_percentage (默认 90) 时，就会触发上述告警。其中 used_percent = 1 – 空闲的宏块数量（free_count）/文件中总的宏块数量。一天只会告警一次。

或磁盘配额使用达到上限datafile_size参数上限。

查看数据库空间限制

[root@localhost ~]# obclient -h192.168.56.100 -P2881 -uroot -p'xxxx' -Doceanbase -A
Welcome to the OceanBase.  Commands end with ; or \g.
Your OceanBase connection id is 3221487655
Server version: OceanBase 4.2.1.2 (r102010012023120119-130bf91ba413a00bb696fe8853906fde1f29f83d) (Built Dec  1 2023 20:00:25)

Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.


obclient [oceanbase]> SHOW PARAMETERS LIKE '%datafile_size%' \G
*************************** 1. row ***************************
      zone: zone1
  svr_type: observer
    svr_ip: 192.168.1.21
  svr_port: 2882
      name: datafile_size
 data_type: NULL
     value: 500G
      info: size of the data file. Range: [0, +∞)
   section: SSTABLE
     scope: CLUSTER
    source: DEFAULT
edit_level: DYNAMIC_EFFECTIVE

obclient [oceanbase]> SHOW PARAMETERS LIKE '%data_disk_usage_limit_percentage%' \G
*************************** 1. row ***************************
      zone: zone1
  svr_type: observer
    svr_ip: 192.168.56.100
  svr_port: 2882
      name: data_disk_usage_limit_percentage
 data_type: NULL
     value: 90
      info: the safe use percentage of data diskRange: [50,100] in integer
   section: OBSERVER
     scope: CLUSTER
    source: DEFAULT
edit_level: DYNAMIC_EFFECTIVE
1 row in set (0.152 sec)

obclient [oceanbase]> show parameters like 'datafile_disk_percentage' \G;
*************************** 1. row ***************************
      zone: zone1
  svr_type: observer
    svr_ip: 192.168.56.100
  svr_port: 2882
      name: datafile_disk_percentage
 data_type: NULL
     value: 20
      info: the percentage of disk space used by the data files. Range: [0,99] in integer
   section: SSTABLE
     scope: CLUSTER
    source: DEFAULT
edit_level: DYNAMIC_EFFECTIVE
1 row in set (0.057 sec)

分析方法
1，从OCP 查看I/O 吞吐量高时间段
2，找该时间段增加对象
3，找该时间段是否有产生临时段的SQL(e.g. hashgroup by, creating index )
4, 分析observer.log日志
5, Oceanbase Bug (4.1存在已知bug)

像上面日志的报错，OB官方有个案例，继续分析查看宏块的使用情况。
通过 observer.log 日志搜索 free_block_cnt 关键字，可以看到宏块剩余的变化，是否有短时间内暴增，案例中宏块从剩余 2 万多到剩余 1 万多宏块。

observer.log:xxxxx:[xxxx-xx-xx xx:xx:xx.xxxxx] INFO  [STORAGE] ob_store_file.cpp:670 [7431][2922][xxxxx-xxxxx] [lt=11] [dc=0] Async write macro block, 
(macro_block_id=1095908-18-0-0[xxxxx], free_block_cnt=226825)
.............
observer.log:xxxxx:[xxxx-xx-xx xx:xx:xx.xxxxx] INFO  [STORAGE] ob_store_file.cpp:670 [7431][2922][xxxxx-xxxxx] [lt=16] [dc=0] Async write macro block, 
(macro_block_id=710747-20-0-0[xxxxx], free_block_cnt=115557)

查看调用堆栈

observer.log:xxxxx:[xxxx-xx-xx xx:xx:xx.xxxxx] INFO  [STORAGE] ob_macro_file.cpp:230 [7431][2922][xxxxx-xxxxx] [lt=3] [dc=0] succeed to open macro file.(fd=150282, 
common::lbt()="0xcc24b9a 0x7cadfb4 0x7ca7a99 0x7caa8c9 0xa3434d7 0xa3438fc 0xa343f12 0xa3442ab 0xab8d866 0xab8b1f9 0xa883c34 0xaf500bf 0xb118540 0x931c418 0x931cdf6 0x9de61fb 0xcd73192 0x9e78952 0x9e70ffa 0x9e7272b 0xc9c3e87 0xc9c0d70 0xc9bfc0f")

使用 addr2line 工具解析堆栈信息。

addr2line -pCfe /home/admin/oceanbase/bin/observer 0xcc24b9a 0x7cadfb4 0x7ca7a99 0x7caa8c9 0xa3434d7 0xa3438fc 0xa343f12 0xa3442ab 0xab8d866 0xab8b1f9 0xa883c34 0xaf500bf 0xb118540 0x931c418 0x931cdf6 0x9de61fb 0xcd73192 0x9e78952 0x9e70ffa 0x9e7272b 0xc9c3e87 0xc9c0d70 0xc9bfc0f

结果

oceanbase::common::lbt() at ??:?
oceanbase::blocksstable::ObMacroFile::init(int, oceanbase::common::ObIAllocator&)::{lambda(char const*)#1}::operator()(char const*) const [clone .constprop.266] at ob_macro_file.cpp:?
oceanbase::blocksstable::ObMacroFile::init(int, oceanbase::common::ObIAllocator&) at ??:?
oceanbase::blocksstable::ObMacroFileManager::open(int&) at ??:?
oceanbase::sql::ObChunkRowStore::write_file(void*, long) at ??:?
oceanbase::sql::ObChunkRowStore::dump(bool, bool) at ??:?
oceanbase::sql::ObChunkRowStore::switch_block(long) at ??:?
oceanbase::sql::ObChunkRowStore::add_row(oceanbase::common::ObNewRow const&, oceanbase::sql::ObChunkRowStore::StoredRow**) at ??:?
oceanbase::sql::ObHashGroupBy::load_data(oceanbase::sql::ObExecContext&) const at ??:?
oceanbase::sql::ObHashGroupBy::inner_get_next_row(oceanbase::sql::ObExecContext&, oceanbase::common::ObNewRow const*&) const at ??:?
oceanbase::sql::ObPhyOperator::get_next_row(oceanbase::sql::ObExecContext&, oceanbase::common::ObNewRow const*&) const at ??:?

调用栈信息显示是在执行 OceanBase HashGroup By 时发生了磁盘写入的操作，可以推断是在 SQL 执行的过程中产生了临时数据的落盘操作，怀疑有大 SQL 执行。

统计空间文件类型使用空间
对上面的temp使用，也可以直接通过VIEW分析

[oceanbase]> select svr_ip,
     round(sum(case when file_type='tenant file data' then data_size else 0 end)/1024/1024) filedata_GB,
     round(sum(case when file_type='tenant file meta data' then data_size else 0 end)/1024/1024) metadata_GB,
     round(sum(case when file_type='tenant index data' then data_size else 0 end)/1024/1024) indexdata_GB,
     round(sum(case when file_type='tenant log data' then data_size else 0 end)/1024/1024) logdata_GB,
     round(sum(case when file_type='tenant tmp data' then data_size else 0 end)/1024/1024) tmpdat_GBa
     from  oceanbase.__all_space_usage
     group by svr_ip;
+---------------+----------+----------+-----------+---------+---------+
| svr_ip        | filedata | metadata | indexdata | logdata | tmpdata |
+---------------+----------+----------+-----------+---------+---------+
| 192.168.56.16 |  1043364 |      550 |    104070 |   14579 |       0 |
| 192.168.56.2  |   612054 |      500 |    205576 |   18023 |       0 |
| 192.168.56.20 |  1305380 |      576 |    162016 |   38270 |  135230 |
...
+---------------+----------+----------+-----------+---------+---------+

obclient> select * from __all_virtual_macro_block_marker_status where svr_ip = 'xxx' and svr_port = xx;

从内部表_ _all_virtual_macro_block_marker_status 或 __all_space_usage 获取临时文件空间使用情况，V3.2以后 __all_space_usage 表，查看 file type 为 tenant tmp data。其它相关VIEW __all_virtual_sys_task_status, V$OB_DTL_INTERM_RESULT_MONITOR, __all_virtual_processlist、__all_virtual_sql_workarea_active 、__all_virtual_dtl_interm_result_monitor 查找正在进程的大任务，如index create, 或大SQL的临时段占用.

单租户临时文件最大磁盘总空间
OceanBase 数据库 V2.2.x 及之前版本，不存在。
OceanBase 数据库 V3.x 版本：单个租户，最大可用临时文件总空间为 4TB，不可以调整。
OceanBase 数据库 V4.x 版本：单个租户，与租户内存大小相关，tenant_memory * 70% * 500GB 磁盘空间，最大可达 100TB。

如何判断是否达到单租户临时文件最大磁盘总空间？
查询 __all_space_usage 看临时文件磁盘占用大小，或查看 __all_virtual_memory_info 表中 label 为 TmpBlockManager 的内存是否达到上限（3x: 15GB，4x:租户内存70%）。

obclient> select * from __all_virtual_memory_info where label = 'TmpBlockManager' and svr_ip = 'xxx' and svr_port = xx;

该view还可以判断是否达到单个 observer 节点同一时刻 read IO 最大带宽(如查询下面的SQL)，在 OceanBase 数据库 V3.x 版本内，该模块存在已知的内存泄露，临时文件 READ IO 内存发生泄露导致内存容量超出了4GB 的最大限制

select * from __all_virtual_memory_info where label = 'TmpPageCache' and svr_ip = 'xxx' and svr_port = xx;

临时空间放大
对于临时文件，空间放大是指实际磁盘占用大于写入数据的数据量。简单地说，写入的 buffer 的大小是固定的，当写入并发度（dir_id）大于写入 buffer 中可以容纳的宏块个数时，就有可能会产生写入放大问题。可以使用下面的方法，确认临时文件宏块是否发生了空间放大。

OceanBase 数据库 V4.1.x 及之前版本。

grep 'ob_tmp_file*' observer.log.2023031022* | grep 'succeed to wash a block' | grep -o 'free_page_nums:[0-9]*' | awk -F ':' '{ sum += $2; } END { print "total_free_page_num = " sum; print "avg_free_page_nums = " sum/NR; print "enlargement_factor = " (252*NR)/(252*NR - sum); print "macro_block_count = " NR }'

OceanBase 数据库 V4.2.x 及之后版本。

grep 'ob_tmp_file*' observer.log.2023031022* | grep 'succeed to wash a block' | grep -o 'free_page_nums=[0-9]*' | awk -F '=' '{ sum += $2; } END { print "total_free_page_num = " sum; print "avg_free_page_nums = " sum/NR; print "enlargement_factor = " (252*NR)/(252*NR - sum); print "macro_block_count = " NR }'

如果发生空间放大，通常可以采用下面两种调整方法，降低放大:

降低并发度，确保写入buffer的宏块个数少于文件写入并发数。
增大buffer大小，可以通过调整租户级 _temporary_file_io_area_size 配置项。

临时文件泄露
如果一个任务开始时打开了一个临时文件，但在该任务结束时却没有删除该文件，即临时文件泄露。一旦发生了临时文件泄露，会导致磁盘空间长期被占用，无法回收。

发生临时文件泄漏可以通过如下方法找出文件泄漏点属于模块。

大查询超时设置不合理，导致残留临时文件，可以查询 __all_virtual_dtl_interm_result_monitor。

如果没有可以，则按照如下方法找出泄漏点属于模块。

找出所有临时文件日志。

grep 'ob_tmp_file' observer.log* > ~/tmp_file.log

找出打开临时文件日志。

grep 'succeed to open a tmp file' tmp_file.log | grep -o 'fd=[0-9]*' | awk -F '=' '{print $2}' | sort -h > open.fd

找出删除临时文件日志。

grep 'succeed to remove a tmp file' tmp_file.log | grep -o 'fd=[0-9]*' | awk -F '=' '{print $2}' | sort -h > remove.fd

找出只打开没有删除的临时文件 fd。
```
grep -f remove.fd -F -v -x open.fd
```

找出只打开没有删除的临时文件日志。

grep 'succeed to open a tmp file' tmp_file.log | grep 'fd=11'

找出只打开没有删除的临时文件函数调用堆栈。

addr2line -pCfe bin/observer 0x1aae97c7 0x14ac6511 0x14ac5363 0xf530a2b 0xf52fa9b 0xf53a0bd 0xf52c62c 0xf480d43 0xf47ee17 0xf4448c5 0xe6cb79c 0x5e7e4ac 0x5e7e0e2 0x60c5dd6 0x5e37331 0xf0b4f3f 0x5e37331 0xe1cfc40 0xe287349 0xe1d23c7 0x604c94a 0x601848c 0x79262e5 0x5cd8eae 0x792486c 0x5cd7014 0x7925594 0x1b230a66 0x1b220c25 0x7f03699a8e25 0x7f0369467f1d

如果真的发生了临时文件泄露，可以通过重启发生泄露的 observer 进程即可。

DDL时磁盘空间不足的问题, 目前obdiag支持了增加索引时报磁盘不足问题的场景，目前支持4.2.1.0及之后的版本

obdiag rca run --scene=ddl_disk_full --input_parameters='{"tenant_name":"test1","table_name":"t555","action_type":"add_index","index_name":"k1"}'

References
https://www.oceanbase.com/knowledge-base/oceanbase-database-1000000000685638
https://www.modb.pro/db/1805068866321715200

打赏

__all_virtual_disk_stat, oceanbase空间使用

对不起，这篇文章暂时关闭评论。

上一篇：聊聊Oceanbase的悬挂事务

下一篇：在PostgreSQL中主键使用 UUIDs vs. bigserial

ANBOB™