MySQL的字符集 character_set相关配置
在 MySQL 中,character_set
配置用于指定数据库、表、列或连接的字符集。字符集决定了如何存储和检索字符数据,例如 UTF-8、latin1 等。MySQL 中有八个与 character_set 相关的配置选项,如下所示。如果不仔细阅读MySQL字符集文档,可能很难知道这些配置选项的用途。此外,对于某些选项,除非进行进一步测试,否则很难知道 MySQL 如何使用它们。
:\mysql-8.0.36-winx64\bin>mysql -uroot -p Enter password: **** Welcome to the MySQL monitor. Commands end with ; or \g. Your MySQL connection id is 8 Server version: 8.0.36 MySQL Community Server - GPL Copyright (c) 2000, 2024, Oracle and/or its affiliates. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. mysql> show variables like 'character%'; +--------------------------+----------------------------------------+ | Variable_name | Value | +--------------------------+----------------------------------------+ | character_set_client | gbk | | character_set_connection | gbk | | character_set_database | utf8mb4 | | character_set_filesystem | binary | | character_set_results | gbk | | character_set_server | utf8mb4 | | character_set_system | utf8mb3 | | character_sets_dir | D:\mysql-8.0.36-winx64\share\charsets\ | +--------------------------+----------------------------------------+ 8 rows in set, 1 warning (0.01 sec)
修改字符集
设置全局字符集
修改 MySQL 配置文件(如 my.cnf
或 my.ini
):
[mysqld] character-set-server=utf8mb4 collation-server=utf8mb4_unicode_ci
然后重启 MySQL 服务以应用更改:
设置会话字符集
SET NAMES 'utf8mb4'; 等同于同时设置: < pre> SET character_set_client = 'utf8mb4'; SET character_set_connection = 'utf8mb4'; SET character_set_results = 'utf8mb4';
创建数据库指定字符集
CREATE DATABASE anbob CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
字符集常见选项
utf8mb4
:支持完整的 UTF-8,包括表情符号。utf8
:MySQL 的旧版 UTF-8 支持,不支持 4 字节字符。latin1
:单字节字符集,适合西欧语言。- gbk :支持国标字符集。
character_set分组
首先,我们可以将这八个选项分为三组:
1.杂项;
1 character_sets_dir:很简单:安装动态加载的字符集的目录。
2 character_set_filesystem:当 MySQL 需要处理文件时,它需要文件名,并且在打开文件之前,该文件名将从character_set_client 转换为character_set_filesystem。默认值为 binary,这意味着不会发生转换。在大多数情况下,请将其保留为默认值 binary,除非您确实确信需要进行更改。
2. 定义数据(表中的列)的character_set;
MySQL 中最小的数据单位是表的列。当该列用于存储字符集合(即字符串,例如 varchar()、char()、text 等)时,MySQL 需要知道这些字符属于哪个字符集,以便 MySQL 能够正确地存储和解释它们。这三个选项的唯一目的是帮助最终以分层方式定义列的字符集。这三个选项分别是character_set_system、character_set_server和character_set_database。
默认值优先级,在指定时使用优先级 系统字符集character_set_system 服务器字符集 character_set_server 数据库字符集 character_set_database ;如果在CREATE TABLE 语句中未指定表字符集和排序规则,则将使用数据库的字符集和排序规则character_set_database 作为表的默认值,如果在列级定义了character set那优先级最高。
mysql> CREATE TABLE t1 ( c1 INT NOT NULL AUTO_INCREMENT PRIMARY KEY, -> c2 VARCHAR(100) COLLATE utf8mb4_general_ci, -> c3 VARCHAR(100) ) DEFAULT CHARACTER SET=latin1; Query OK, 0 rows affected (0.01 sec) mysql> desc t1; +-------+--------------+------+-----+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +-------+--------------+------+-----+---------+----------------+ | c1 | int | NO | PRI | NULL | auto_increment | | c2 | varchar(100) | YES | | NULL | | | c3 | varchar(100) | YES | | NULL | | +-------+--------------+------+-----+---------+----------------+ 3 rows in set (0.01 sec) mysql> select TABLE_SCHEMA,TABLE_NAME,COLUMN_NAME, CHARACTER_SET_NAME from information_schema.columns where table_name='t1'; +--------------+------------+-------------+--------------------+ | TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | CHARACTER_SET_NAME | +--------------+------------+-------------+--------------------+ | anbob | t1 | c1 | NULL | | anbob | t1 | c2 | utf8mb4 | | anbob | t1 | c3 | latin1 | +--------------+------------+-------------+--------------------+ 3 rows in set (0.00 sec)
3. 在进行传输过程中字符集处理解释。
此组中有三个选项:character_set_client ,character_set_ connection和character_set_results 。
character_set_client
:客户端字符集character_set_connection
:连接字符集character_set_results
:查询结果字符集
MySQL 使用前两个选项来写入表:服务器将character_set_client系统变量作为客户端发送语句的字符集。服务器将客户端发送的语句从character_set_client转换为character_set_connection 。当 character_set_client 与 character_set_connection 相同时,无需进行转换。
mysql> set names gbk; Query OK, 0 rows affected (0.00 sec) mysql> select @@character_set_client,@@character_set_connection,@@character_set_results; +------------------------+----------------------------+-------------------------+ | @@character_set_client | @@character_set_connection | @@character_set_results | +------------------------+----------------------------+-------------------------+ | gbk | gbk | gbk | +------------------------+----------------------------+-------------------------+ 1 row in set (0.00 sec) mysql> select binary("张"); +----------------------------+ | binary("张") | +----------------------------+ | 0xD5C5 | +----------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> set character_set_connection=utf8mb4; Query OK, 0 rows affected (0.00 sec) mysql> select @@character_set_client,@@character_set_connection,@@character_set_results; +------------------------+----------------------------+-------------------------+ | @@character_set_client | @@character_set_connection | @@character_set_results | +------------------------+----------------------------+-------------------------+ | gbk | utf8mb4 | gbk | +------------------------+----------------------------+-------------------------+ 1 row in set (0.00 sec) mysql> select binary("张"); +----------------------------+ | binary("张") | +----------------------------+ | 0xE5BCA0 | +----------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> select binary(convert(convert('张' using gbk) using utf8mb4)); +----------------------------------------------------------------------------------------------------------------+ | binary(convert(convert('张' using gbk) using utf8mb4)) | +----------------------------------------------------------------------------------------------------------------+ | 0xE5BCA0 | +----------------------------------------------------------------------------------------------------------------+
从表中读取:character_set_results系统变量指示服务器向客户端返回查询结果时使用的字符集。要告诉服务器不对结果集或错误消息进行转换,请将character_set_results设置为NULL或binary 。
小结
MySQL 将进行潜在的转换以确保character_set_client、 character_set_connection和列的目标字符集对齐。需要根据具体使用场景选择合适的字符集和排序规则,尤其是在处理多语言或特殊字符时,推荐使用 utf8mb4
。
我们需要小心地执行转换,因为当 MySQL 从允许更多值(更大)的字符集转换为允许更少值(更小)的字符集(例如utf8mb4 -> latin1 )时,可能会导致数据丢失或出现“ ERROR 1366 (HY000) Incorrect string value ”。
References
https://www.percona.com/blog/how-mysql-uses-character_set-configurations/
目前这篇文章还没有评论(Rss)