Oracle、MySQL、PostGreSQL、SQL Server数据库比较系列(一):null value
最近几年数据库市场百花齐放,在做跨数据库迁移的数据库选型时,除了性能、稳定、安全、运维、功能、可扩展外,像开发中对于值的处理往往容易被人忽视, 之前写过一篇关于PG区别Oracle在SQL解析缓存的笔记《PostgreSQL 12 : Prepare statement和plan_cache_mode 参数》,这里记录一下null 值在这几个数据库中的区别。
软件版本:Oracle 21c 、SQL Server 2019 、MySQL 8.0 、Mariadb 10.6 、PostGreSQL 13、OpenGauss 2.0
创建测试用例表
CREATE TABLE tab_null(id int, name char(10)); INSERT INTO tab_null VALUES(1,'anbob'); INSERT INTO tab_null VALUES(2,NULL);
测试 数据过滤
# oracle SQL> select * from tab_null where name is null; ID NAME ---------- ---------- 2 SQL> select * from tab_null where name is not null; ID NAME ---------- ---------- 1 anbob SQL> select * from tab_null where name=null; no rows selected SQL> select * from tab_null where null=null; no rows selected # postgresql postgres=# select * from tab_null where name is null; id | name ----+------ 2 | (1 row) postgres=# select * from tab_null where name is not null; id | name ----+------------ 1 | anbob (1 row) postgres=# select * from tab_null where name=null; id | name ----+------ (0 rows) postgres=# select * from tab_null where null=null; id | name ----+------ (0 rows) # MySQL/MariaDB mysql> select * from tab_null where name is null; +------+------+ | id | name | +------+------+ | 2 | NULL | +------+------+ 1 row in set (0.00 sec) mysql> select * from tab_null where name is not null; +------+-------+ | id | name | +------+-------+ | 1 | anbob | +------+-------+ 1 row in set (0.00 sec) mysql> select * from tab_null where name=null; Empty set (0.00 sec) mysql> select * from tab_null where null=null; Empty set (0.00 sec) # SQL Server select * from tab_null where name is null; ID NAME ---------- ---------- 2 select * from tab_null where name is not null; ID NAME ---------- ---------- 1 anbob select * from tab_null where name=null; no rows selected select * from tab_null where null=null; no rows selected
Note:
可见所有数据库的结果是一样的。
另一种另类
# Mysql mysql> select 1 from tab_null where 1 not in (null); Empty set (0.00 sec) mysql> select 1 from tab_null where null not in (null); Empty set (0.00 sec) mysql> select 1 from tab_null where null in (null); Empty set (0.00 sec) mysql> select 1 from tab_null where exists(select null from dual); +---+ | 1 | +---+ | 1 | | 1 | | 1 | +---+ 3 rows in set (0.05 sec)
Note:
这类在4个库返回也是一样的,上面只附了MySQL,不再展示其它库。
唯一约束
# oracle SQL> alter table tab_null add constraint c_tab_null_name_uni unique(name); Table altered. SQL> INSERT INTO tab_null VALUES(3,NULL); 1 row created. SQL> select * from tab_null; ID NAME ---------- ---------- 1 anbob 2 3 # postgresql postgres=# alter table tab_null add constraint c_tab_null_name_uni unique(name); ALTER TABLE postgres=# INSERT INTO tab_null VALUES(3,NULL); INSERT 0 1 postgres=# select * from tab_null; id | name ----+------------ 1 | anbob 2 | 3 | (3 rows) # MySQL/MariaDB mysql> alter table tab_null add constraint c_tab_null_name_uni unique(name); Query OK, 0 rows affected (0.14 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> INSERT INTO tab_null VALUES(3,NULL); Query OK, 1 row affected (0.01 sec) mysql> select * from tab_null; +------+-------+ | id | name | +------+-------+ | 1 | anbob | | 2 | NULL | | 3 | NULL | +------+-------+ 3 rows in set (0.00 sec) # SQL SERVER alter table tab_null add constraint c_tab_null_name_uni unique(name); INSERT INTO tab_null VALUES(3,NULL); Msg 2627 Level 14 State 1 Line 12 Violation of UNIQUE KEY constraint 'c_tab_null_name_uni'. Cannot insert duplicate key in object 'dbo.tab_null'. The duplicate key value is (<NULL>).
Note:
这里只有SQL SERVER提示一个表中的null和null是重复记录, 其它库可以正常insert 多个 null 到有唯一约束的表。
NULL使用索引
# MySQL/MariaDB mysql> alter table tab_null drop constraint c_tab_null_name_uni; Query OK, 0 rows affected (0.05 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> create index idx_tab_null_name on tab_null(name); Query OK, 0 rows affected (0.05 sec) Records: 0 Duplicates: 0 Warnings: 0 mysql> explain select * from tab_null where name is null; +----+-------------+----------+------------+------+-------------------+-------------------+---------+-------+------+----------+-----------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+----------+------------+------+-------------------+-------------------+---------+-------+------+----------+-----------------------+ | 1 | SIMPLE | tab_null | NULL | ref | idx_tab_null_name | idx_tab_null_name | 41 | const | 2 | 100.00 | Using index condition | +----+-------------+----------+------------+------+-------------------+-------------------+---------+-------+------+----------+-----------------------+ 1 row in set, 1 warning (0.00 sec) # PostgreSQL postgres=# alter table tab_null drop constraint c_tab_null_name_uni; ALTER TABLE postgres=# create index idx_tab_null_name on tab_null(name); CREATE INDEX postgres=# explain analyze select * from tab_null where name is null; QUERY PLAN --------------------------------------------------------------------------------------------------- Seq Scan on tab_null (cost=0.00..1.03 rows=1 width=18) (actual time=0.008..0.009 rows=2 loops=1) Filter: (name IS NULL) Rows Removed by Filter: 1 Planning Time: 0.130 ms Execution Time: 0.019 ms -- 因为小表代价,这是使用了seq scan 全表扫,下面往表里insert一些数据 postgres=# insert into tab_null select generate_series,generate_series||'anbob' from generate_series(101,10000); INSERT 0 9900 postgres=# analyze tab_null; ANALYZE postgres=# explain analyze select * from tab_null where name is null; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Index Scan using idx_tab_null_name on tab_null (cost=0.29..9.64 rows=2 width=15) (actual time=0.010..0.011 rows=2 loops=1) Index Cond: (name IS NULL) Planning Time: 0.199 ms Execution Time: 0.031 ms (4 rows) # Oracle SQL>alter table tab_null drop constraint c_tab_null_name_uni; Table altered. SQL> create index idx_tab_null_name on tab_null(name); Index created. SQL> explain plan for select /*+index(t)*/ * from tab_null t where name is null; Explained. SQL> @x2 PLAN_TABLE_OUTPUT ------------------------------------------------------------------------------------------------ Plan hash value: 2647411751 ------------------------------------------------------------------------------ | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ------------------------------------------------------------------------------ | 0 | SELECT STATEMENT | | 2 | 50 | 2 (0)| 00:00:01 | |* 1 | TABLE ACCESS FULL| TAB_NULL | 2 | 50 | 2 (0)| 00:00:01 | ------------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter("NAME" IS NULL) Hint Report (identified by operation id / Query Block Name / Object Alias): Total hints for statement: 1 (U - Unused (1)) --------------------------------------------------------------------------- 1 - SEL$1 / "T"@"SEL$1" U - index(t) 解决这个问题可以增加常数的复合索引 SQL> create index idx_tab_null_name_0 on tab_null(name,0); Index created. SQL> explain plan for select /*+index(t)*/ * from tab_null t where name is null; Explained. SQL> @x2 PLAN_TABLE_OUTPUT ----------------------------------------------------------------------------------------------------------- Plan hash value: 2247804559 ----------------------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ----------------------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 2 | 50 | 2 (0)| 00:00:01 | | 1 | TABLE ACCESS BY INDEX ROWID BATCHED| TAB_NULL | 2 | 50 | 2 (0)| 00:00:01 | |* 2 | INDEX RANGE SCAN | IDX_TAB_NULL_NAME_0 | 1 | | 1 (0)| 00:00:01 | ----------------------------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 2 - access("NAME" IS NULL) # SQL SERVER SET statistics profile on; SELECT * FROM tab_null with (index(idx_tab_null_name)) Where name IS NULL;
Rows | Executes | StmtText | StmtId | NodeId | Parent | PhysicalOp | LogicalOp | Argument | DefinedValues | EstimateRows | EstimateIO | EstimateCPU | AvgRowSize | TotalSubtreeCost | OutputList | Warnings | Type | Parallel | EstimateExecutions |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 1 | SELECT * FROM tab_null with (index(idx_tab_null_name)) Where name IS NULL | 1 | 1 | 0 | 1 | 0.00657038 | SELECT | False | ||||||||||
1 | 1 | |–Nested Loops(Inner Join, OUTER REFERENCES:([Bmk1000])) | 1 | 2 | 1 | Nested Loops | Inner Join | OUTER REFERENCES:([Bmk1000]) | 1 | 0 | 4.18E-06 | 21 | 0.00657038 | [fiddle_83bb216b11bf448d8dd49445932f20ee].[dbo].[tab_null].[id], [fiddle_83bb216b11bf448d8dd49445932f20ee].[dbo].[tab_null].[name] | PLAN_ROW | False | 1 | ||
1 | 1 | |--Index Seek(OBJECT:([fiddle_83bb216b11bf448d8dd49445932f20ee].[dbo].[tab_null].[idx_tab_null_name]), SEEK:([fiddle_83bb216b11bf448d8dd49445932f20ee].[dbo].[tab_null].[name]=NULL) ORDERED FORWARD) | 1 | 3 | 2 | Index Seek | Index Seek | OBJECT:([fiddle_83bb216b11bf448d8dd49445932f20ee].[dbo].[tab_null].[idx_tab_null_name]), SEEK:([fiddle_83bb216b11bf448d8dd49445932f20ee].[dbo].[tab_null].[name]=NULL) ORDERED FORWARD, FORCEDINDEX | [Bmk1000], [fiddle_83bb216b11bf448d8dd49445932f20ee].[dbo].[tab_null].[name] | 1 | 0.003125 | 0.0001581 | 25 | 0.0032831 | [Bmk1000], [fiddle_83bb216b11bf448d8dd49445932f20ee].[dbo].[tab_null].[name] | PLAN_ROW | False | 1 | |
1 | 1 | |–RID Lookup(OBJECT:([fiddle_83bb216b11bf448d8dd49445932f20ee].[dbo].[tab_null]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD) | 1 | 5 | 2 | RID Lookup | RID Lookup | OBJECT:([fiddle_83bb216b11bf448d8dd49445932f20ee].[dbo].[tab_null]), SEEK:([Bmk1000]=[Bmk1000]) LOOKUP ORDERED FORWARD | [fiddle_83bb216b11bf448d8dd49445932f20ee].[dbo].[tab_null].[id] | 1 | 0.003125 | 0.0001581 | 11 | 0.0032831 | [fiddle_83bb216b11bf448d8dd49445932f20ee].[dbo].[tab_null].[id] | PLAN_ROW | False | 1 |
Note:
对于is null的谓词条件注意MySQL、MariaDB即使很少的记录也使用的index感觉更像是RULE CBO;PostgreSQL开始使用了全表扫,也可能是基于代价的估算,全表扫要优于使用索引,因为pg默认没有像oracle,sql server, mysql 自带的hint可以强制使用索引,后来我们填充了更多的数据,PostgreSQL也使用上了索引,当然PG 也有扩展pg_hint_plan可以实现,不过PG认为用户中使用hint干扰优化器是不好的习惯,不应该那么做;Oracle数据库因为单列索引不会储存null值,所以is null 即使使用hint 也无法使用索引,需要优化小技巧,增加常量的复合索引使用索引; SQL Server同样我们在加hint with (index(idx_tab_null_name))后,执行计划中的Index Seek 也可以确认用上了索引。
字符串拼接
# sql server 2019 select null+'anbob' ------------- null # mariadb 10.6 select null+'anbob' ------------- null # mysql 8.0 select null+'anbob' ------------- null # postgres 13 postgres=# select 'anbob'||null; ?column? ---------- (1 row) # oracle 21c SQL> select null||'anbob' from dual; NULL| ----- anbob # OpenGauss [og@oel7db1 data]$ gsql -d anbob -p 15432 gsql ((openGauss 2.0.0 build 78689da9) compiled at 2021-03-31 21:04:03 commit 0 last mr ) NOTICE : The password has been expired, please change the password. Non-SSL connection (SSL connection is recommended when requiring high-security) Type "help" for help. ^ anbob=# select null||'anbob' ; ?column? ---------- anbob (1 row)
Note:
注意也是只有Oracle不同于其它库,在null值和其它字符串拼接后可以正常返回部分有效值,而其它所有库全部返回空。
总结:
对于null值的谓词过滤条件时(IS NULL、IS NOT NULL, >,< ,=),4个数据库返回数据记录是一样的;
对于null 值的唯一约束,只有SQL Server不允许null 记录重复,而其它数据库不限制;
对于is null使用Btree索引, 只有Oracle是默认无法索引的(需要创建常量的复合索引),其它数据库优化器在认为合适时可以正常使用索引;
对于null与字符串拼接就更有意思,sql server\postgresql\mysql在与null拼接后返回null, Oracle返回是字符串的值,而且基于Postgresql的OpenGauss返回和Oracle相同也是字符串,看来是与oracle做过兼容性修改。
— 下期继续 —
对不起,这篇文章暂时关闭评论。