OceanBase执行计划(一):索引与回表访问
在oceanbase中对于表与索引的访问路径在查看执行计划时,发现和oracle理解还是有一定的区别, 例如在oracle中full table scan只访问表, TABLE ACCESS BY INDEX ROWID 配合 index range scan或INDEX UNIQUE SCAN是索引扫描加回表,如果没有带TABLE ACCESS BY INDEX ROWID 表示不用回表,从索引中取到数据,也可以叫做index only scan 或者是covering index. 那是因为oracle是heap table(这里不算IOT索引组织表), 所以索引与表分离, 但是像MySQL及同系的GoldenDB、GreatDB等、Oceanbase及达梦都是一种索引组织表,同Oracle中的IOT表及索引,创建的其它索引为二级索引。下面测试一下OB(以下表示oceanbase)的执行计划的显示.
这里使用Oceanbase for mysql租户。
[admin@OceanBase1 ~]$ obclient -h172.20.22.107 -uroot@sys -P2881 -p -A Enter password: Welcome to the OceanBase. Commands end with ; or \g. Your OceanBase connection id is 3221663541 Server version: OceanBase 3.2.4.1 (r101000052023010822-346aa35c32e99d1b82d713f75f0072c45bdf7aab) (Built Jan 8 2023 22:52:43) Copyright (c) 2000, 2018, OceanBase and/or its affiliates. All rights reserved. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. obclient [(none)]> show databases; obclient [(none)]> use test Database changed obclient [test]> CREATE TABLE employee1 (id int,LastName varchar(255),FirstName varchar(255),Address varchar(255),profile varchar(255)); Query OK, 0 rows affected (0.304 sec) obclient [test]> CREATE TABLE employee2 (id int primary key,LastName varchar(255),FirstName varchar(255),Address varchar(255),profile varchar(255)); Query OK, 0 rows affected (0.107 sec) obclient [test]> create index idx_emp_1 on employee1(id); -- 二级索引 Query OK, 0 rows affected (0.791 sec) obclient [test]> DELIMITER $$ obclient [test]> CREATE PROCEDURE BulkInsert() BEGIN DECLARE i INT DEFAULT 1; truncate table employee1; truncate table employee2; WHILE (i <= 20000) DO INSERT INTO employee1 (id, FirstName, Address) VALUES(i, CONCAT("user","-",i), CONCAT("address","-",i)); INSERT INTO employee2 (id,FirstName, Address) VALUES(i,CONCAT("user","-",i), CONCAT("address","-",i)); SET i = i+1; END WHILE; END $$ Query OK, 0 rows affected (0.070 sec) obclient [test]> show index from employee2; +-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | +-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+ | employee2 | 0 | PRIMARY | 1 | id | A | NULL | NULL | NULL | | BTREE | available | | YES | +-----------+------------+----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+ 1 row in set (0.004 sec) obclient [test]> show index from employee1; +-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | +-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+ | employee1 | 1 | idx_emp_1 | 1 | id | A | NULL | NULL | NULL | YES | BTREE | available | | YES | +-----------+------------+-----------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+ 1 row in set (0.004 sec) obclient [test]> call BulkInsert; -> $$ Query OK, 1 row affected (1 min 20.434 sec)
创建了两张表employee1 和employee2. employee1 的创建表未指定PK,内部会自动创建PK, employee2的ID 为pk 在创建表时指定。
如何验证会自动创建的隐藏列?
在我之前的《如何查询OceanBase的数据字典或VIRTUAL TABLES?》中我们先从oracle DBA角度出发,找dba_tab_columns view, 通过py源码找到SYS.ALL_VIRTUAL_COLUMN_REAL_AGENT,
obclient [test]> select table_id,TABLE_NAME from sys.ALL_VIRTUAL_TABLE_REAL_AGENT where UPPER(TABLE_NAME) LIKE 'EMP%';
+---------------+------------+
| table_id | TABLE_NAME |
+---------------+------------+
| 1099511677788 | employee1 |
| 1099511677789 | employee2 |
+---------------+------------+
2 rows in set (0.009 sec)
obclient [test]> desc sys.ALL_VIRTUAL_COLUMN_REAL_AGENT
-> ;
+-----------------------------+-----------------------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------+-----------------------------------+------+-----+---------+-------+
| TENANT_ID | decimal(38,0) | NO | PRI | NULL | |
| TABLE_ID | decimal(38,0) | NO | PRI | NULL | |
| COLUMN_ID | decimal(38,0) | NO | PRI | NULL | |
| COLUMN_NAME | varchar(128) | NO | | NULL | |
| ROWKEY_POSITION | decimal(38,0) | NO | | NULL | |
| INDEX_POSITION | decimal(38,0) | NO | | NULL | |
| ORDER_IN_ROWKEY | decimal(38,0) | NO | | NULL | |
| PARTITION_KEY_POSITION | decimal(38,0) | NO | | NULL | |
| DATA_TYPE | decimal(38,0) | NO | | NULL | |
| DATA_LENGTH | decimal(38,0) | NO | | NULL | |
| DATA_PRECISION | decimal(38,0) | YES | | NULL | |
| DATA_SCALE | decimal(38,0) | YES | | NULL | |
| ZERO_FILL | decimal(38,0) | NO | | NULL | |
| NULLABLE | decimal(38,0) | NO | | NULL | |
| ON_UPDATE_CURRENT_TIMESTAMP | decimal(38,0) | NO | | NULL | |
| AUTOINCREMENT | decimal(38,0) | NO | | NULL | |
| IS_HIDDEN | decimal(38,0) | NO | | NULL | |
| COLLATION_TYPE | decimal(38,0) | NO | | NULL | |
| ORIG_DEFAULT_VALUE | varchar(262144) | YES | | NULL | |
| CUR_DEFAULT_VALUE | varchar(262144) | YES | | NULL | |
| COMMENT | longtext | YES | | NULL | |
| SCHEMA_VERSION | decimal(38,0) | NO | | NULL | |
| COLUMN_FLAGS | decimal(38,0) | NO | | NULL | |
| PREV_COLUMN_ID | decimal(38,0) | NO | | NULL | |
| EXTENDED_TYPE_INFO | varbinary(65536) | YES | | NULL | |
| ORIG_DEFAULT_VALUE_V2 | varbinary(262144) | YES | | NULL | |
| CUR_DEFAULT_VALUE_V2 | varbinary(262144) | YES | | NULL | |
| SRS_ID | decimal(38,0) | NO | | NULL | |
| GMT_CREATE | timestamp(6) with local time zone | NO | | NULL | |
| GMT_MODIFIED | timestamp(6) with local time zone | NO | | NULL | |
+-----------------------------+-----------------------------------+------+-----+---------+-------+
30 rows in set (0.003 sec)
obclient [test]> SELECT TABLE_ID,COLUMN_ID,COLUMN_NAME,IS_HIDDEN,AUTOINCREMENT FROM sys.ALL_VIRTUAL_COLUMN_REAL_AGENT WHERE TABLE_ID IN(1099511677788,1099511677789) ORDER BY 1,2;
Empty set (0.005 sec)
obclient [test]> SELECT * FROM sys.ALL_VIRTUAL_COLUMN_REAL_AGENT;
Empty set (0.004 sec)
-- 注意是空, 然后找找__开头的
obclient [sys]> select table_name from information_schema.tables where table_name like '__all_virtual%column%' order by 1;
+------------------------------------------+
| table_name |
+------------------------------------------+
| __all_virtual_column |
| __all_virtual_column_history |
| __all_virtual_column_stat |
| __all_virtual_column_statistic |
| __all_virtual_column_stat_v2 |
| __all_virtual_column_stat_v2_history |
| __all_virtual_column_usage |
| __all_virtual_constraint_column |
| __all_virtual_constraint_column_history |
| __all_virtual_core_column_table |
| __all_virtual_foreign_key_column |
| __all_virtual_foreign_key_column_history |
| __ALL_VIRTUAL_INFORMATION_COLUMNS |
| __all_virtual_sstable_column_checksum |
+------------------------------------------+
14 rows in set (0.011 sec)
obclient [sys]> desc oceanbase.__all_virtual_column;
+-----------------------------+-------------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------------------+-------------------+------+-----+---------+-------+
| tenant_id | bigint(20) | NO | PRI | NULL | |
| table_id | bigint(20) | NO | PRI | NULL | |
| column_id | bigint(20) | NO | PRI | NULL | |
| gmt_create | timestamp(6) | NO | | NULL | |
| gmt_modified | timestamp(6) | NO | | NULL | |
| column_name | varchar(128) | NO | | | |
| rowkey_position | bigint(20) | NO | | 0 | |
| index_position | bigint(20) | NO | | NULL | |
| order_in_rowkey | bigint(20) | NO | | NULL | |
| partition_key_position | bigint(20) | NO | | NULL | |
| data_type | bigint(20) | NO | | NULL | |
| data_length | bigint(20) | NO | | NULL | |
| data_precision | bigint(20) | YES | | NULL | |
| data_scale | bigint(20) | YES | | NULL | |
| zero_fill | bigint(20) | NO | | NULL | |
| nullable | bigint(20) | NO | | NULL | |
| on_update_current_timestamp | bigint(20) | NO | | NULL | |
| autoincrement | bigint(20) | NO | | NULL | |
| is_hidden | bigint(20) | NO | | 0 | |
| collation_type | bigint(20) | NO | | NULL | |
| orig_default_value | varchar(262144) | YES | | NULL | |
| cur_default_value | varchar(262144) | YES | | NULL | |
| comment | longtext | YES | | NULL | |
| schema_version | bigint(20) | NO | | NULL | |
| column_flags | bigint(20) | NO | | 0 | |
| prev_column_id | bigint(20) | NO | | -1 | |
| extended_type_info | varbinary(65536) | YES | | NULL | |
| orig_default_value_v2 | varbinary(262144) | YES | | NULL | |
| cur_default_value_v2 | varbinary(262144) | YES | | NULL | |
| srs_id | bigint(20) | NO | | 0 | |
+-----------------------------+-------------------+------+-----+---------+-------+
30 rows in set (0.004 sec)
obclient [sys]> SELECT TABLE_ID,COLUMN_ID,COLUMN_NAME,IS_HIDDEN,AUTOINCREMENT FROM oceanbase.__all_virtual_column WHERE TABLE_ID IN(1099511677788,1099511677789) ORDER BY 1,2;
+---------------+-----------+----------------+-----------+---------------+
| TABLE_ID | COLUMN_ID | COLUMN_NAME | IS_HIDDEN | AUTOINCREMENT |
+---------------+-----------+----------------+-----------+---------------+
| 1099511677788 | 1 | __pk_increment | 1 | 1 | --------在这里 employee1的隐藏PK
| 1099511677788 | 16 | id | 0 | 0 |
| 1099511677788 | 17 | LastName | 0 | 0 |
| 1099511677788 | 18 | FirstName | 0 | 0 |
| 1099511677788 | 19 | Address | 0 | 0 |
| 1099511677788 | 20 | profile | 0 | 0 |
| 1099511677789 | 16 | id | 0 | 0 | -------employee2 ,注意这里的column_id 预留了15个值
| 1099511677789 | 17 | LastName | 0 | 0 |
| 1099511677789 | 18 | FirstName | 0 | 0 |
| 1099511677789 | 19 | Address | 0 | 0 |
| 1099511677789 | 20 | profile | 0 | 0 |
+---------------+-----------+----------------+-----------+---------------+
11 rows in set (0.057 sec)
Note:
通过oceanbase.__all_virtual_column 可以查看所有列,包含隐藏列pk
未指定PK表(employee1) 索引覆盖
obclient [test]> explain extended select id from employee1 where id=1 \G *************************** 1. row *************************** Query Plan: =================================================== |ID|OPERATOR |NAME |EST. ROWS|COST| --------------------------------------------------- |0 |TABLE SCAN|employee1(idx_emp_1)|1 |46 | =================================================== Outputs & filters: ------------------------------------- 0 - output([employee1.id(0x7fb75ca059e0)]), filter(nil), access([employee1.id(0x7fb75ca059e0)]), partitions(p0), is_index_back=false, range_key([employee1.id(0x7fb75ca059e0)], [employee1.__pk_increment(0x7fb75ca46a50)]), range(1,MIN ; 1,MAX), range_cond([employee1.id(0x7fb75ca059e0) = 1(0x7fb75ca052c0)]) Used Hint: ------------------------------------- /*+ */ Outline Data: ------------------------------------- /*+ BEGIN_OUTLINE_DATA INDEX(@"SEL$1" "test.employee1"@"SEL$1" "idx_emp_1") END_OUTLINE_DATA */ Plan Type: ------------------------------------- LOCAL Optimization Info: ------------------------------------- employee1:table_rows:20000, physical_range_rows:1, logical_range_rows:1, index_back_rows:0, output_rows:1, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[idx_emp_1], pruned_index_name[idx_emp_name1,employee1], estimation info[table_id:1099511677791, (table_type:1, version:0-1702231210412936-1702231210412936, logical_rc:1, physical_rc:1), (table_type:7, version:1702231200914594-1702231200914594-1702231229329434, logical_rc:0, physical_rc:0), (table_type:5, version:1702231200914594-1702231200914594-1702231229329434, logical_rc:0, physical_rc:0), (table_type:0, version:1702231229329434-1702231229329434-9223372036854775807, logical_rc:0, physical_rc:0)]
Note:
非pk列的索引列访问opertor算子依然叫table scan 而不是index scan, name显示了二级索引名称idx_emp_1,而且显示了隐藏PK __pk_increment 使用了CBO。
指定PK表(employee2) 索引覆盖
obclient [test]> explain extended select id from employee2 where id=1 \G *************************** 1. row *************************** Query Plan: ======================================= |ID|OPERATOR |NAME |EST. ROWS|COST| --------------------------------------- |0 |TABLE GET|employee2|1 |46 | ======================================= Outputs & filters: ------------------------------------- 0 - output([employee2.id(0x7fabd0a059e0)]), filter(nil), access([employee2.id(0x7fabd0a059e0)]), partitions(p0), is_index_back=false, range_key([employee2.id(0x7fabd0a059e0)]), range[1 ; 1], range_cond([employee2.id(0x7fabd0a059e0) = 1(0x7fabd0a052c0)]) Used Hint: ------------------------------------- /*+ */ Outline Data: ------------------------------------- /*+ BEGIN_OUTLINE_DATA FULL(@"SEL$1" "test.employee2"@"SEL$1") END_OUTLINE_DATA */ Plan Type: ------------------------------------- LOCAL Optimization Info: ------------------------------------- employee2:table_rows:20000, physical_range_rows:1, logical_range_rows:1, index_back_rows:0, output_rows:1, est_method:local_storage, optimization_method=rule_based, heuristic_rule=unique_index_without_indexback
NOTE:
pk列的索引列访问opertor算子叫TABLE GET. TABLE SCAN 算子的 operator 有两种形式:TABLE SCAN 和 TABLE GET,TABLE GET 直接用主键定位,TABLE SCAN 属于范围扫描。
ID列就是PK, 执行计划中没有显示任何 pk信息, 同时outline 显示 FULL hint, 但是optimizetion部分又显示 表有2w行,但范围rows就1, 但是使用的RBO,而且是unique_index_without_indexback。
outline 是错误的。
未指定PK表(employee1) 非索引覆盖列
obclient [test]> explain extended select firstname from employee1 where id=1 \G *************************** 1. row *************************** Query Plan: =================================================== |ID|OPERATOR |NAME |EST. ROWS|COST| --------------------------------------------------- |0 |TABLE SCAN|employee1(idx_emp_1)|1 |92 | =================================================== Outputs & filters: ------------------------------------- 0 - output([employee1.FirstName(0x7fb275c06110)]), filter(nil), access([employee1.FirstName(0x7fb275c06110)]), partitions(p0), is_index_back=true, range_key([employee1.id(0x7fb275c05a20)], [employee1.__pk_increment(0x7fb275c46d90)]), range(1,MIN ; 1,MAX), range_cond([employee1.id(0x7fb275c05a20) = 1(0x7fb275c05300)]) Used Hint: ------------------------------------- /*+ */ Outline Data: ------------------------------------- /*+ BEGIN_OUTLINE_DATA INDEX(@"SEL$1" "test.employee1"@"SEL$1" "idx_emp_1") END_OUTLINE_DATA */ Plan Type: ------------------------------------- LOCAL Optimization Info: ------------------------------------- employee1:table_rows:20000, physical_range_rows:1, logical_range_rows:1, index_back_rows:1, output_rows:1, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[idx_emp_1], pruned_index_name[idx_emp_name1], unstable_index_name[employee1], estimation info[table_id:1099511677791, (table_type:1, version:0-1702231210412936-1702231210412936, logical_rc:1, physical_rc:1), (table_type:7, version:1702231200914594-1702231200914594-1702231229329434, logical_rc:0, physical_rc:0), (table_type:5, version:1702231200914594-1702231200914594-1702231229329434, logical_rc:0, physical_rc:0), (table_type:0, version:1702231229329434-1702231229329434-9223372036854775807, logical_rc:0, physical_rc:0)]
Note:
算是预期内的,因为有firstname列要回表,is_index_back=true。
指定PK表(employee1) 非索引覆盖列
obclient [test]> explain extended select firstname from employee2 where id=1 \G
*************************** 1. row ***************************
Query Plan: =======================================
|ID|OPERATOR |NAME |EST. ROWS|COST|
---------------------------------------
|0 |TABLE GET|employee2|1 |46 |
=======================================
Outputs & filters:
-------------------------------------
0 - output([employee2.FirstName(0x7fb10a006110)]), filter(nil),
access([employee2.FirstName(0x7fb10a006110)]), partitions(p0),
is_index_back=false,
range_key([employee2.id(0x7fb10a005a20)]), range[1 ; 1],
range_cond([employee2.id(0x7fb10a005a20) = 1(0x7fb10a005300)])
Used Hint:
-------------------------------------
/*+
*/
Outline Data:
-------------------------------------
/*+
BEGIN_OUTLINE_DATA
FULL(@"SEL$1" "test.employee2"@"SEL$1")
END_OUTLINE_DATA
*/
Plan Type:
-------------------------------------
LOCAL
Optimization Info:
-------------------------------------
employee2:table_rows:20000, physical_range_rows:1, logical_range_rows:1, index_back_rows:0, output_rows:1, est_method:local_storage,
optimization_method=rule_based, heuristic_rule=unique_index_without_indexback
Note:
对于表及索引的IOT显示is_index_back=false, outline同样是full不能理解,一样是RBO 。
“如果没有索引的名字,则说明执行的是主表扫描。这里需要注意,在 OceanBase 数据库中,主表和索引的组织结构是一样的,主表本身也是一个索引。”
未指定PK表(employee1) 索引列全扫
obclient [test]> explain extended select sum(id) from employee1 \G *************************** 1. row *************************** Query Plan: ======================================================== |ID|OPERATOR |NAME |EST. ROWS|COST| -------------------------------------------------------- |0 |SCALAR GROUP BY| |1 |8500| |1 | TABLE SCAN |employee1(idx_emp_1)|20000 |7737| ======================================================== Outputs & filters: ------------------------------------- 0 - output([T_FUN_SUM(employee1.id(0x7fb808c05070))(0x7fb808c04950)]), filter(nil), group(nil), agg_func([T_FUN_SUM(employee1.id(0x7fb808c05070))(0x7fb808c04950)]) 1 - output([employee1.id(0x7fb808c05070)]), filter(nil), access([employee1.id(0x7fb808c05070)]), partitions(p0), is_index_back=false, range_key([employee1.id(0x7fb808c05070)], [employee1.__pk_increment(0x7fb808c499f0)]), range(MIN,MIN ; MAX,MAX)always true
指定PK表(employee2) 索引列全扫
obclient [test]> show index from employee2; +-----------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Visible | +-----------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+ | employee2 | 0 | PRIMARY | 1 | id | A | NULL | NULL | NULL | | BTREE | available | | YES | | employee2 | 1 | idx_emp_name2 | 1 | LastName | A | NULL | NULL | NULL | YES | BTREE | available | | YES | +-----------+------------+---------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+-----------+---------------+---------+ obclient [test]> explain extended select sum(id) from employee2 \G *************************** 1. row *************************** Query Plan: ============================================================ |ID|OPERATOR |NAME |EST. ROWS|COST| ------------------------------------------------------------ |0 |SCALAR GROUP BY| |1 |8500| |1 | TABLE SCAN |employee2(idx_emp_name2)|20000 |7737| ============================================================ Outputs & filters: ------------------------------------- 0 - output([T_FUN_SUM(employee2.id(0x7fac4c005070))(0x7fac4c004950)]), filter(nil), group(nil), agg_func([T_FUN_SUM(employee2.id(0x7fac4c005070))(0x7fac4c004950)]) 1 - output([employee2.id(0x7fac4c005070)]), filter(nil), access([employee2.id(0x7fac4c005070)]), partitions(p0), is_index_back=false, range_key([employee2.LastName(0x7fac4c04a390)], [employee2.id(0x7fac4c005070)]), range(MIN,MIN ; MAX,MAX)always true Used Hint: ------------------------------------- /*+ */ Outline Data: ------------------------------------- /*+ BEGIN_OUTLINE_DATA INDEX(@"SEL$1" "test.employee2"@"SEL$1" "idx_emp_name2") END_OUTLINE_DATA */ Plan Type: ------------------------------------- LOCAL Optimization Info: ------------------------------------- employee2:table_rows:20000, physical_range_rows:20000, logical_range_rows:20000, index_back_rows:0, output_rows:20000, est_method:local_storage, optimization_method=cost_based, avaiable_index_name[idx_emp_name2,employee2], estimation info[table_id:1099511677793, (table_type:1, version:0-1702283757163029-1702283757163029, logical_rc:20000, physical_rc:20000), (table_type:0, version:1702231229122664-1702231229122664-9223372036854775807, logical_rc:0, physical_rc:0)]
Note:
我在两个表的lastname列都创建了索引,sum(id)列但OB“莫名其妙”的使用了name列上的索引,但回神一想,如果NAME列索引idx_emp_name2存储行位置值(如oracle的ROWID)应该是主键列的ID值。而如果只读PK索引是否就是全表扫(表及索引)? 这么算扫描[PK VALUE]+NAME的二级索引代价可能更低吧。
— over —
对不起,这篇文章暂时关闭评论。