COLLATE影响PostgreSQL/openGauss数据库的排序与索引使用
当创建PostgreSQL数据库时,可以根据业务需要配置不同数据库的collation , collation影响包括字符串排序顺序、字符分类方法、数值格式、日期和时间格式以及货币格式。 此外,您可能还需要配置 LC_COLLATE 和 LC_CTYPE 环境变量。 在创建库、创建表、创建索引、查询时也可以指定该collation,不同的collation可能会产生不同的排序,甚至导致无法使用索引。
postgresql支持的字符集
https://www.postgresql.org/docs/15/multibyte.html Server列为YES支持服务端,否则仅支持客户端。
当前字符库支持的LC_COLLATE, LC_CTYPE
LC_COLLATE, LC_CTYPE当不指定时默认的,但要与当煎数据库的字符集兼容, 您可以执行以下 SQL 语句从 pg_collation 系统表中查询字符集支持的 LC_COLLATE 和 LC_CTYPE 设置:
[local]:5432 postgres@db1=# select distinct pg_encoding_to_char(collencoding) from pg_collation; pg_encoding_to_char --------------------- EUC_CN LATIN7 ISO_8859_7 WIN1251 ISO_8859_6 LATIN3 LATIN6 LATIN5 ISO_8859_8 LATIN9 LATIN1 LATIN2 EUC_TW UTF8 EUC_KR KOI8U KOI8R ISO_8859_5 WIN1255 LATIN8 EUC_JP (22 rows) [local]:5432 postgres@db1=# select pg_encoding_to_char(collencoding) as encoding,collname,collcollate,collctype, collencoding from pg_collation where pg_encoding_to_char(collencoding) like '%EUC_CN%'; encoding | collname | collcollate | collctype | collencoding ----------+--------------+--------------+--------------+-------------- EUC_CN | zh_CN | zh_CN | zh_CN | 2 EUC_CN | zh_CN.gb2312 | zh_CN.gb2312 | zh_CN.gb2312 | 2 EUC_CN | zh_SG | zh_SG | zh_SG | 2 EUC_CN | zh_SG.gb2312 | zh_SG.gb2312 | zh_SG.gb2312 | 2 (4 rows) [local]:5432 postgres@db1=# select pg_encoding_to_char(collencoding) as encoding,collname,collcollate,collctype from pg_collation ; encoding | collname | collcollate | collctype ------------+-----------------------+-----------------------+----------------------- | default | | | C | C | C | POSIX | POSIX | POSIX UTF8 | ucs_basic | C | C LATIN1 | aa_DJ | aa_DJ | aa_DJ LATIN1 | aa_DJ.iso88591 | aa_DJ.iso88591 | aa_DJ.iso88591 UTF8 | aa_DJ.utf8 | aa_DJ.utf8 | aa_DJ.utf8 UTF8 | aa_ER | aa_ER | aa_ER UTF8 | aa_ER.utf8 | aa_ER.utf8 | aa_ER.utf8 ...
Note:
ENCODING是null 表示所有字符集都支持.
创建数据库时指定COLLATE
CREATE DATABASE hrdb WITH ENCODING 'UTF8' LC_COLLATE='C.UTF-8' LC_CTYPE='C.UTF-8' TEMPLATE = template0 CONNECTION LIMIT = 200 --IS_TEMPLATE = TRUE;
note:
一但创建不支持alter database修改。建议 LC_COLLATE使用“C”,性能上有些提升.
创建表时指定COLLATE
CREATE TABLE test1 ( a text COLLATE "de_DE", b text COLLATE "es_ES", ... );
note:
后期可以修改列 collate,如 alter table a alter c1 type text COLLATE “zh_CN”;
创建索引时指定COLLATE
CREATE INDEX testc_name_idx3 ON testc(name COLLATE "C")
Note:
可以在同一列创建不同collate索引。
查询时指定collate
SELECT * FROM t WHERE name >= 'z' COLLATE "C"; create index idxa on a(c1 collate "zh_CN"); explain select * from a order by c1 collate "zh_CN";
Note:
如果查询使用的collate与索引collate不同,可能无法使用索引.
当前locale
[postgres@oel7db1 ~]$ env|grep LAN LANG=en_US.UTF-8 [postgres@oel7db1 ~]$ locale LANG=en_US.UTF-8 LC_CTYPE="en_US.UTF-8" LC_NUMERIC="en_US.UTF-8" LC_TIME="en_US.UTF-8" LC_COLLATE="en_US.UTF-8" LC_MONETARY="en_US.UTF-8" LC_MESSAGES="en_US.UTF-8" LC_PAPER="en_US.UTF-8" LC_NAME="en_US.UTF-8" LC_ADDRESS="en_US.UTF-8" LC_TELEPHONE="en_US.UTF-8" LC_MEASUREMENT="en_US.UTF-8" LC_IDENTIFICATION="en_US.UTF-8" LC_ALL=
查看数据库的locale
[local]:5432 postgres@db1=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges -----------+----------+-----------+------------+------------+----------------------- anbob | postgres | SQL_ASCII | C | C | db | postgres | SQL_ASCII | C | C | db1 | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 | postgres | postgres | SQL_ASCII | C | C | template0 | postgres | SQL_ASCII | C | C | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | SQL_ASCII | C | C | =c/postgres + | | | | | postgres=CTc/postgres (6 rows)
collate对排序的影响
[local]:5432 postgres@db1=# select * from (values ('Alice'),('Tom'),('anbob')) as a(c1) order by c1 ; c1 ------- Alice anbob Tom (3 rows) [local]:5432 postgres@db1=# select * from (values ('Alice'),('Tom'),('anbob')) as a(c1) order by c1 collate "C"; c1 ------- Alice Tom anbob (3 rows) [local]:5432 postgres@db1=# select * from (values ('Alice'),('Tom'),('anbob')) as a(c1) order by c1 collate "zh_CN"; c1 ------- Alice anbob Tom (3 rows)
collate对索引的影响
[local]:5432 postgres@postgres=# CREATE DATABASE "example_db" postgres-# WITH OWNER "postgres" postgres-# ENCODING 'UTF8' postgres-# LC_COLLATE = 'en_US.UTF8' postgres-# LC_CTYPE = 'en_US.UTF8' postgres-# TEMPLATE template0; CREATE DATABASE create table testc(id int,name varchar(20),addr varchar(3000),otype char(1)); insert into testc select x,x||'anbob',rpad('x',2000,'x'),'t' from generate_series(1,10000) as x; [local]:5432 postgres@example_db=# create index on testc(name); CREATE INDEX [local]:5432 postgres@example_db=# \d testc Table "public.testc" Column | Type | Collation | Nullable | Default --------+-------------------------+-----------+----------+--------- id | integer | | | name | character varying(20) | | | addr | character varying(3000) | | | otype | character(1) | | | Indexes: "testc_name_idx" btree (name) [local]:5432 postgres@example_db=# analyze testc; ANALYZE [local]:5432 postgres@example_db=# explain select * from testc where name like '1a%'; QUERY PLAN -------------------------------------------------------- Seq Scan on testc (cost=0.00..229.00 rows=1 width=50) Filter: ((name)::text ~~ '1a%'::text) (2 rows)
NOTE:
使用全表扫,即使选择率很好,可以创建xx_pattern_ops索引.
创建xx_pattern_ops索引
[local]:5432 postgres@example_db=# CREATE INDEX testc_name_idx2 ON testc(name varchar_pattern_ops); CREATE INDEX [local]:5432 postgres@example_db=# \l testc List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges ------+-------+----------+---------+-------+------------------- (0 rows) [local]:5432 postgres@example_db=# \d testc Table "public.testc" Column | Type | Collation | Nullable | Default --------+-------------------------+-----------+----------+--------- id | integer | | | name | character varying(20) | | | addr | character varying(3000) | | | otype | character(1) | | | Indexes: "testc_name_idx" btree (name) "testc_name_idx2" btree (name varchar_pattern_ops) [local]:5432 postgres@example_db=# explain select * from testc where name like '1a%'; QUERY PLAN ---------------------------------------------------------------------------------- Index Scan using testc_name_idx2 on testc (cost=0.29..8.31 rows=1 width=50) Index Cond: (((name)::text ~>=~ '1a'::text) AND ((name)::text ~<~ '1b'::text)) Filter: ((name)::text ~~ '1a%'::text) (3 rows)
也可以创建COLLATE “C”索引
drop index testc_name_idx2; CREATE INDEX testc_name_idx3 ON testc(name COLLATE "C") [local]:5432 postgres@example_db=# explain select * from testc where name like '1a%'; QUERY PLAN ------------------------------------------------------------------------------ Index Scan using testc_name_idx3 on testc (cost=0.29..8.31 rows=1 width=50) Index Cond: (((name)::text >= '1a'::text) AND ((name)::text < '1b'::text)) Filter: ((name)::text ~~ '1a%'::text) (3 rows)
xx_pattern_ops and COLLATE “C”不同?
CREATE INDEX ON t(name text/varchar_pattern_ops) 和CREATE INDEX ON t(name COLLATE “C”)都可以使用索引,请注意,如果您希望涉及普通 <、<=、> 或 >= 比较的查询使用索引,您还应该使用默认运算符类创建索引。 此类查询不能使用 xxx_pattern_ops 运算符类。 (但是,普通的相等比较可以使用这些运算符类。)可以使用不同的运算符类在同一列上创建多个索引。 如果您确实使用 C 语言环境,则不需要 xxx_pattern_ops 运算符类,因为具有默认运算符类的索引可用于 C 语言环境中的模式匹配查询。
使用Collate=C的DB
[local]:5432 postgres@example_db=# \l List of databases Name | Owner | Encoding | Collate | Ctype | Access privileges ------------+----------+-----------+------------+------------+----------------------- anbob | postgres | SQL_ASCII | C | C | db | postgres | SQL_ASCII | C | C | example_db | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 | postgres | postgres | SQL_ASCII | C | C | template0 | postgres | SQL_ASCII | C | C | =c/postgres + | | | | | postgres=CTc/postgres template1 | postgres | SQL_ASCII | C | C | =c/postgres + | | | | | postgres=CTc/postgres (6 rows) [local]:5432 postgres@example_db=# \c anbob You are now connected to database "anbob" as user "postgres". [local]:5432 postgres@anbob=# create table testc(id int,name varchar(20),addr varchar(3000),otype char(1)); CREATE TABLE [local]:5432 postgres@anbob=# insert into testc select x,x||'anbob',rpad('x',2000,'x'),'t' from generate_series(1,10000) as x; INSERT 0 10000 [local]:5432 postgres@anbob=# create index on testc(name); CREATE INDEX [local]:5432 postgres@anbob=# \d testc Table "public.testc" Column | Type | Collation | Nullable | Default --------+-------------------------+-----------+----------+--------- id | integer | | | name | character varying(20) | | | addr | character varying(3000) | | | otype | character(1) | | | Indexes: "testc_name_idx" btree (name) [local]:5432 postgres@anbob=# explain select * from testc where name like '1a%'; QUERY PLAN ------------------------------------------------------------------------------ Index Scan using testc_name_idx on testc (cost=0.29..9.66 rows=1 width=50) Index Cond: (((name)::text >= '1a'::text) AND ((name)::text < '1b'::text)) Filter: ((name)::text ~~ '1a%'::text) (3 rows)
Note:
默认不需要指定collate和创建xxx_pattern_ops索引 。
— over —
对不起,这篇文章暂时关闭评论。