Oracle、MySQL、PostgreSQL、OceanBase、万里开源数据库比较系列(十七): IN ( MAX Subquery )
在关系数据库中两表关联在oracle中使用IN( subquery)的语法很常见, 但kevin发现在MySQL中subquery使用MAX聚合参数时,会导致主查询Full scan而无法使用索引范围扫描, 当遇到大表时可能性能下降明显 ,测试发现有的库使用子查询做为驱动表,有的是使用filter从主查询过滤。下面简单测试。
先看oracle
QL> CREATE TABLE employee1 (id int,LastName varchar(255),FirstName varchar(255),Address varchar(255),profile varchar(255));
Table created.
SQL> insert into employee1 select rownum,'anbob'||level,'weejar','xxxxxxxxxxx','yyyyyyy' from dual connect by rownum<=100000; 100000 rows created. SQL> create table employee2 as select * from employee1;
Table created.
SQL> create index idx_emp_id1 on employee1(id);
Index created.
SQL> create index idx_emp_id2 on employee2(id);
Index created.
SQL> create index idx_emp_lname1 on employee1(lastname);
Index created.
SQL> create index idx_emp_lname2 on employee2(lastname);
Index created.
SQL> explain plan for select * from employee2 where id in(select max(id) from employee1 where lastname='anbob1');
Explained.
SQL> @x2
PLAN_TABLE_OUTPUT
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 616252811
---------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
---------------------------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 43 | 4 (0)| 00:00:01 |
| 1 | TABLE ACCESS BY INDEX ROWID BATCHED | EMPLOYEE2 | 1 | 43 | 2 (0)| 00:00:01 |
|* 2 | INDEX RANGE SCAN | IDX_EMP_ID2 | 1 | | 1 (0)| 00:00:01 |
| 3 | SORT AGGREGATE | | 1 | 16 | | |
| 4 | TABLE ACCESS BY INDEX ROWID BATCHED| EMPLOYEE1 | 1 | 16 | 2 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | IDX_EMP_LNAME1 | 1 | | 1 (0)| 00:00:01 |
---------------------------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
2 - access("ID"= (SELECT MAX("ID") FROM "EMPLOYEE1" "EMPLOYEE1" WHERE "LASTNAME"='anbob1'))
5 - access("LASTNAME"='anbob1')
Note:
执行计划可阅读性非常棒,不解释。
再看Postgresql
-- 表结构与构建数据省略 weejar=# explain select id from employee2 where id in(select max(id) from employee1); QUERY PLAN --------------------------------------------------------------------------------------------------------------- Nested Loop (cost=1.76..10.31 rows=1 width=4) -> Result (cost=1.62..1.63 rows=1 width=4) InitPlan 1 (returns $0) -> Limit (cost=0.14..1.62 rows=1 width=4) -> Index Only Scan Backward using idx_emp_1 on employee1 (cost=0.14..44.66 rows=30 width=4) Index Cond: (id IS NOT NULL) -> Index Only Scan using employee2_pkey on employee2 (cost=0.14..8.15 rows=1 width=4) Index Cond: (id = ($0)) (8 行记录)
Note:
Postgresql也和oracle一样符合预期。
MySQL社区版v8.0
[root@oel7db1 ~]# mysql -uroot -pwww.anbob.com -h127.0.0.1 -P3306 anbob mysql: [Warning] Using a password on the command line interface can be insecure. Reading table information for completion of table and column names You can turn off this feature to get a quicker startup with -A Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 8 Server version: 8.0.20-commercial MySQL Enterprise Server - Commercial Copyright (c) 2000, 2020, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MYSQL_root@127.0.0.1 [anbob]> use anbob; Database changed MYSQL_root@127.0.0.1 [anbob]> CREATE TABLE employee1 (id int,LastName varchar(255),FirstName varchar(255),Address varchar(255),profile varchar(255)); Query OK, 0 rows affected (0.17 sec) MYSQL_root@127.0.0.1 [anbob]> CREATE TABLE employee2 (id int primary key,LastName varchar(255),FirstName varchar(255),Address varchar(255),profile varchar(255)); Query OK, 0 rows affected (0.09 sec) MYSQL_root@127.0.0.1 [anbob]> create index idx_emp_1 on employee1(id); Query OK, 0 rows affected (0.09 sec) Records: 0 Duplicates: 0 Warnings: 0 MYSQL_root@127.0.0.1 [anbob]> DELIMITER $$ MYSQL_root@127.0.0.1 [anbob]> CREATE PROCEDURE BulkInsert() -> BEGIN -> DECLARE i INT DEFAULT 1; -> truncate table employee1; -> truncate table employee2; -> WHILE (i <= 20000) DO -> INSERT INTO employee1 (id, FirstName, Address) VALUES(i, CONCAT("user","-",i), CONCAT("address","-",i)); -> INSERT INTO employee2 (id,FirstName, Address) VALUES(i,CONCAT("user","-",i), CONCAT("address","-",i)); -> SET i = i+1; -> END WHILE; -> END -> $$ Query OK, 0 rows affected (0.12 sec) MYSQL_root@127.0.0.1 [anbob]> DELIMITER ; MYSQL_root@127.0.0.1 [anbob]> call BulkInsert(); -- 吐槽, MYSQL这存储过程调用真是慢到怀疑人生。 MYSQL_root@127.0.0.1 [anbob]> explain select id from employee1 where id in(select max(id) from employee2) ; +----+--------------------+-----------+------------+-------+---------------+---------------+---------+------+-------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-----------+------------+-------+---------------+---------------+---------+------+-------+----------+------------------------------+ | 1 | PRIMARY | employee1 | NULL | index | NULL | idx_emp_name1 | 1023 | NULL | 20160 | 100.00 | Using where; Using index | | 2 | DEPENDENT SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | +----+--------------------+-----------+------------+-------+---------------+---------------+---------+------+-------+----------+------------------------------+ 2 rows in set, 1 warning (0.00 sec) MYSQL_root@127.0.0.1 [anbob]> explain format=tree select id from employee1 where id in(select max(id) from employee2) \G *************************** 1. row *************************** EXPLAIN: -> Filter: (employee1.id,(select #2)) (cost=2111.85 rows=20160) -> Index scan on employee1 using idx_emp_name1 (cost=2111.85 rows=20160) -> Select #2 (subquery in condition; dependent) -> Filter: ((employee1.id) = (max(employee2.id))) -> Rows fetched before execution 1 row in set (0.00 sec) MYSQL_root@127.0.0.1 [anbob]> create index idx_emp_name1 on employee1(firstname); MYSQL_root@127.0.0.1 [anbob]> create index idx_emp_name2 on employee2(firstname); MYSQL_root@127.0.0.1 [anbob]> explain select id from employee1 where id in(select max(id) from employee2 where firstname='user-1'); +----+--------------------+-----------+------------+-------+---------------+---------------+---------+------+-------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-----------+------------+-------+---------------+---------------+---------+------+-------+----------+------------------------------+ | 1 | PRIMARY | employee1 | NULL | index | NULL | idx_emp_name1 | 1023 | NULL | 20160 | 100.00 | Using where; Using index | | 2 | DEPENDENT SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | +----+--------------------+-----------+------------+-------+---------------+---------------+---------+------+-------+----------+------------------------------+ 2 rows in set, 1 warning (0.00 sec) -------------------------+ -> Filter: (employee1.id,(select #2)) (cost=2111.85 rows=20160) -> Index scan on employee1 using idx_emp_name1 (cost=2111.85 rows=20160) -> Select #2 (subquery in condition; dependent) -> Filter: ((employee1.id) = (max(employee2.id))) -> Rows fetched before execution +--------------------------------------------------------- MYSQL_root@127.0.0.1 [anbob]> explain select * from employee1 where id in(select max(id) from employee2 where firstname IN('user-1','user-2')) ; +----+-------------+-----------+------------+-------+---------------+---------------+---------+------+-------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+-------+---------------+---------------+---------+------+-------+----------+--------------------------+ | 1 | PRIMARY | employee1 | NULL | ALL | NULL | NULL | NULL | NULL | 20160 | 100.00 | Using where | | 2 | SUBQUERY | employee2 | NULL | range | idx_emp_name2 | idx_emp_name2 | 1023 | NULL | 2 | 100.00 | Using where; Using index | +----+-------------+-----------+------------+-------+---------------+---------------+---------+------+-------+----------+--------------------------+ 2 rows in set, 1 warning (0.00 sec) MYSQL_root@127.0.0.1 [anbob]> explain select * from employee1 where id in(select id from employee2 where firstname IN('user-1','user-2')) ; +----+-------------+-----------+------------+--------+-----------------------+---------------+---------+--------------------+------+----------+--------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+--------+-----------------------+---------------+---------+--------------------+------+----------+--------------------------+ | 1 | SIMPLE | employee2 | NULL | range | PRIMARY,idx_emp_name2 | idx_emp_name2 | 1023 | NULL | 2 | 100.00 | Using where; Using index | | 1 | SIMPLE | employee1 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | anbob.employee2.id | 1 | 100.00 | NULL | +----+-------------+-----------+------------+--------+-----------------------+---------------+---------+--------------------+------+----------+--------------------------+ 2 rows in set, 1 warning (0.01 sec) -- 如果没有聚合函数MAX ,是用子查询做为驱动表。同时emp1 表也使用上了索引. MYSQL_root@127.0.0.1 [anbob]> explain analyze select * from employee1 where id in(select max(id) from employee2 where firstname IN('user-1','user-2')) \G *************************** 1. row *************************** EXPLAIN: -> Filter: (employee1.id,employee1.id in (select #2)) (cost=2111.85 rows=20160) (actual time=0.086..65.882 rows=1 loops=1) -> Table scan on employee1 (cost=2111.85 rows=20160) (actual time=0.036..26.958 rows=20000 loops=1) -> Select #2 (subquery in condition; run only once) -> Filter: ((employee1.id = ``.`max(id)`)) (actual time=0.001..0.001 rows=0 loops=20001) -> Limit: 1 row(s) (actual time=0.001..0.001 rows=0 loops=20001) -> Index lookup on using (max(id)=employee1.id) (actual time=0.001..0.001 rows=0 loops=20001) -> Materialize with deduplication (actual time=0.001..0.001 rows=0 loops=20001) -> Aggregate: max(employee2.id) (actual time=0.032..0.032 rows=1 loops=1) -> Filter: (employee2.FirstName in ('user-1','user-2')) (cost=1.50 rows=2) (actual time=0.019..0.028 rows=2 loops=1) -> Index range scan on employee2 using idx_emp_name2 (cost=1.50 rows=2) (actual time=0.017..0.026 rows=2 loops=1) 1 row in set (0.07 sec) MYSQL_root@127.0.0.1 [anbob]> explain analyze select * from employee1 where id in(select id from employee2 where firstname IN('user-1','user-2')) \G *************************** 1. row *************************** EXPLAIN: -> Nested loop inner join (cost=3.68 rows=2) (actual time=0.028..0.040 rows=2 loops=1) -> Filter: (employee2.FirstName in ('user-1','user-2')) (cost=1.50 rows=2) (actual time=0.018..0.026 rows=2 loops=1) -> Index range scan on employee2 using idx_emp_name2 (cost=1.50 rows=2) (actual time=0.016..0.024 rows=2 loops=1) -> Single-row index lookup on employee1 using PRIMARY (id=employee2.id) (cost=1.04 rows=1) (actual time=0.006..0.006 rows=1 loops=2) 1 row in set (0.01 sec)
Note:
MySQL中子查询使用了MAX聚合函数,导致主查询全表扫,如果表较大时,性能会变化明显。
从MySQL 8.0.21开始再可以支持子查询做为驱动表。 SET SESSION optimizer_switch=”subquery_to_derived=ON”;
MYSQL_root@127.0.0.1 [anbob]> show variables like '%optimizer_switch%'\G; *************************** 1. row *************************** Variable_name: optimizer_switch Value: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,engine_condition_pushdown=on,index_condition_pushdown=on,mrr=on,mrr_cost_based=on,block_nested_loop=on,batched_key_access=off,materialization=on,semijoin=on,loosescan=on,firstmatch=on,duplicateweedout=on,subquery_materialization_cost_based=on,use_index_extensions=on,condition_fanout_filter=on,derived_merge=on,use_invisible_indexes=off,skip_scan=on,hash_join=on 1 row in set (0.00 sec) 而之前只能改写SQL使用join MYSQL_root@127.0.0.1 [anbob]> explain -> select * from -> employee1 e1 -> join employee2 e2 on e2.id=e1.id -> where e2.firstname IN('user-1','user-2'); +----+-------------+-------+------------+--------+-----------------------+---------------+---------+-------------+------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+-----------------------+---------------+---------+-------------+------+----------+-----------------------+ | 1 | SIMPLE | e2 | NULL | range | PRIMARY,idx_emp_name2 | idx_emp_name2 | 1023 | NULL | 2 | 100.00 | Using index condition | | 1 | SIMPLE | e1 | NULL | eq_ref | PRIMARY | PRIMARY | 4 | anbob.e2.id | 1 | 100.00 | NULL | +----+-------------+-------+------------+--------+-----------------------+---------------+---------+-------------+------+----------+-----------------------+ 2 rows in set, 1 warning (0.00 sec)
Note:
在MYSQL中还是老老实时的单表查询吧,优化器有点弱,相比oracle差的太远。
看一下OceanBase for MySQL模式.
-- 构建数据同MYSQL完全一样。 obclient [test]> explain select id from employee1 where id in(select max(id) from employee2) \G *************************** 1. row *************************** Query Plan: ========================================================== |ID|OPERATOR |NAME |EST. ROWS|COST| ---------------------------------------------------------- |0 |NESTED-LOOP JOIN | |199 |169 | |1 | SUBPLAN SCAN |VIEW1 |1 |46 | |2 | SCALAR GROUP BY| |1 |46 | |3 | SUBPLAN SCAN |VIEW2 |1 |46 | |4 | TABLE SCAN |employee2(Reverse) |1 |46 | |5 | TABLE SCAN |employee1(idx_emp_1)|199 |77 | ========================================================== Outputs & filters: ------------------------------------- 0 - output([employee1.id]), filter(nil), conds(nil), nl_params_([VIEW1.max(id)]) 1 - output([VIEW1.max(id)]), filter(nil), access([VIEW1.max(id)]) 2 - output([T_FUN_MAX(VIEW2.id)]), filter(nil), group(nil), agg_func([T_FUN_MAX(VIEW2.id)]) 3 - output([VIEW2.id]), filter(nil), access([VIEW2.id]) 4 - output([employee2.id]), filter(nil), access([employee2.id]), partitions(p0), limit(1), offset(nil) 5 - output([employee1.id]), filter(nil), access([employee1.id]), partitions(p0) obclient [test]> explain select * from employee2 where id in(select max(id) from employee1 where firstname='user-1') \G *************************** 1. row *************************** Query Plan: ============================================================== |ID|OPERATOR |NAME |EST. ROWS|COST| -------------------------------------------------------------- |0 |NESTED-LOOP JOIN | |1 |114 | |1 | SUBPLAN SCAN |VIEW1 |1 |92 | |2 | SCALAR GROUP BY| |1 |92 | |3 | SUBPLAN SCAN |VIEW2 |1 |92 | |4 | LIMIT | |1 |92 | |5 | TOP-N SORT | |1 |92 | |6 | TABLE SCAN |employee1(idx_emp_name1)|1 |92 | |7 | TABLE GET |employee2 |1 |22 | ============================================================== Outputs & filters: ------------------------------------- 0 - output([employee2.id], [employee2.LastName], [employee2.FirstName], [employee2.Address], [employee2.profile]), filter(nil), conds(nil), nl_params_([VIEW1.max(id)]) 1 - output([VIEW1.max(id)]), filter(nil), access([VIEW1.max(id)]) 2 - output([T_FUN_MAX(VIEW2.id)]), filter(nil), group(nil), agg_func([T_FUN_MAX(VIEW2.id)]) 3 - output([VIEW2.id]), filter(nil), access([VIEW2.id]) 4 - output([employee1.id]), filter(nil), limit(1), offset(nil) 5 - output([employee1.id]), filter(nil), sort_keys([employee1.id, DESC]), topn(1) 6 - output([employee1.id]), filter([(T_OP_IS_NOT, employee1.id, NULL, 0)]), access([employee1.id]), partitions(p0) 7 - output([employee2.id], [employee2.LastName], [employee2.FirstName], [employee2.Address], [employee2.profile]), filter(nil), access([employee2.id], [employee2.LastName], [employee2.FirstName], [employee2.Address], [employee2.profile]), partitions(p0) obclient [test]> explain select * from employee2 where id in(select id from employee1 where firstname='user-1') \G *************************** 1. row *************************** Query Plan: ============================================================= |ID|OPERATOR |NAME |EST. ROWS|COST| ------------------------------------------------------------- |0 |NESTED-LOOP JOIN| |1 |114 | |1 | SUBPLAN SCAN |VIEW2 |1 |92 | |2 | MERGE DISTINCT| |1 |92 | |3 | SORT | |1 |92 | |4 | TABLE SCAN |employee1(idx_emp_name1)|1 |92 | |5 | TABLE GET |employee2 |1 |22 | ============================================================= Outputs & filters: ------------------------------------- 0 - output([employee2.id], [employee2.LastName], [employee2.FirstName], [employee2.Address], [employee2.profile]), filter(nil), conds(nil), nl_params_([VIEW2.VIEW1.id]) 1 - output([VIEW2.VIEW1.id]), filter(nil), access([VIEW2.VIEW1.id]) 2 - output([employee1.id]), filter(nil), distinct([employee1.id]) 3 - output([employee1.id]), filter(nil), sort_keys([employee1.id, ASC]) 4 - output([employee1.id]), filter(nil), access([employee1.id]), partitions(p0) 5 - output([employee2.id], [employee2.LastName], [employee2.FirstName], [employee2.Address], [employee2.profile]), filter(nil), access([employee2.id], [employee2.LastName], [employee2.FirstName], [employee2.Address], [employee2.profile]), partitions(p0)
Note:
在OB for mysql中也是支持子查询做为驱动表的。在for oracle租户执行计划是一样的。
GreatDB万里开源数据库
GreatDB Cluster[test]> explain format=tree select id from employee1 where id in(select max(id) from employee2 where firstname='user-1'); +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | EXPLAIN | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | -> Filter: (employee1.id,(select #2)) (cost=14.99 rows=111) -> Index scan on employee1 using idx_emp_name1 {[datanode3:datanode3_1]: SELECT `id`, `_hidden_pk_` FROM `test`.`employee1`;} (cost=14.99 rows=111) -> Select #2 (subquery in condition; dependent) -> Filter: ((employee1.id) = (max(employee2.id))) -> Rows fetched before execution (cost=0.00..0.00 rows=1) | +-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) GreatDB Cluster[test]> explain select id from employee1 where id in(select max(id) from employee2 where firstname='user-1'); +----+--------------------+-----------+------------+-------+---------------+---------------+---------+------+------+----------+------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+--------------------+-----------+------------+-------+---------------+---------------+---------+------+------+----------+------------------------------+ | 1 | PRIMARY | employee1 | NULL | index | NULL | idx_emp_name1 | 1023 | NULL | 111 | 100.00 | Using where; Using index | | 2 | DEPENDENT SUBQUERY | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | NULL | Select tables optimized away | +----+--------------------+-----------+------------+-------+---------------+---------------+---------+------+------+----------+------------------------------+ 2 rows in set, 1 warning (0.01 sec) GreatDB Cluster[test]> explain select * from employee1 where id in(select max(id) from employee2 where firstname IN('user-1','user-2')) ; +----+-------------+-----------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-----------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------------------------------------------------------------+ | 1 | PRIMARY | employee1 | NULL | ALL | NULL | NULL | NULL | NULL | 111 | 100.00 | Using where | | 2 | SUBQUERY | employee2 | NULL | range | idx_emp_name2 | idx_emp_name2 | 1023 | NULL | 20 | 100.00 | Using pushed condition (`test`.`employee2`.`FirstName` in ('user-1','user-2')); Using index | +----+-------------+-----------+------------+-------+---------------+---------------+---------+------+------+----------+---------------------------------------------------------------------------------------------+ 2 rows in set, 1 warning (0.00 sec) GreatDB Cluster[test]>
Note:
万里数据库并没有把子查询做为驱动表,仿佛把subquery的值以filter的形式传给了主查询employee1 ,并且使用的索引,与oracle的执行路径接近,而不是像mysql改写后或pg一样子查询驱动的方式。
References
https://www.modb.pro/db/1730045837189931008
对不起,这篇文章暂时关闭评论。