首页 » MySQL/TiDB/GoldenDB » MySQL 5.7同步延迟案例1: FLUSH PRIVILEGES死锁
MySQL 5.7同步延迟案例1: FLUSH PRIVILEGES死锁
一套MySQL V5.7主从同步(MTS)检查时延时较高,这是mysql的常见现象,如果重启备库时提示err 1236,又是mysql的常见错误, 个人比较排斥MySQL(当然这是个人主观意愿,但并不否认它的流行和某些场景的能力),遇到了这个问题就跟着看看,分析slave上的进程发现 GRANT 在 FLUSH PRIVILEGES 之前获得锁时出现死锁,简单记录这个5.7 bug。
How to check Replica_SQL health
-- from source: show master status; -- From Replica show slave status\G
Note:
MySQL 8后建议使用show replica ,show slave 已过时。
show slave 关注指标
Replica_IO_Running: Yes Replica_SQL_Running: Yes ... Replica_IO_State: Waiting for source to send event ... Source_Log_File: mysql-bin.000006 Read_Source_Log_Pos: 1437728 Relay_Log_File: mysql-relay.000013 Relay_Log_Pos: 453735 Relay_Source_Log_File: mysql-bin.000006 ... Exec_Source_Log_Pos: 453522 Relay_Log_Space: 1438184 ... Seconds_Behind_Source: 217 ... Retrieved_Gtid_Set: 00020192-1111-1111-1111-111111111111:1-105931 Executed_Gtid_Set: 00020192-1111-1111-1111-111111111111:1-103277
本案例现象
mysql> show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 10.230.xxx.xxx Master_User: xxx Master_Port: 10000 Connect_Retry: 1 Master_Log_File: mysql-bin.002375 Read_Master_Log_Pos: 786639555 Relay_Log_File: mysql-relay.004188 Relay_Log_Pos: 311803492 Relay_Master_Log_File: mysql-bin.001396 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 311803279 Relay_Log_Space: 1140905480027 ... Seconds_Behind_Master: 15937230 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 200922033 Master_UUID: 8690d6b3-fc9c-11ea-97c2-8446fe2f3274 Master_Info_File: mysql.slave_master_info SQL_Delay: 0 SQL_Remaining_Delay: NULL Slave_SQL_Running_State: Waiting for dependent transaction to commit Master_Retry_Count: 86400 Master_Bind: Last_IO_Error_Timestamp: Last_SQL_Error_Timestamp: Master_SSL_Crl: Master_SSL_Crlpath: Retrieved_Gtid_Set: 8690d6b3-fc9c-11ea-97c2-8446fe2f3274:121233235-1775703749 Executed_Gtid_Set: 8690d6b3-fc9c-11ea-97c2-8446fe2f3274:1-1075955764 Auto_Position: 1 Replicate_Rewrite_DB: Channel_Name: Master_TLS_Version: 1 row in set (0.01 sec)
Note:
这是一主两从的架构,1个slave库已重启提示1236, 上面这输出是另一套未重启的slave 。
查看Slave进程
mysql> show processlist;
+----------+-----------------+---------------------+------+---------+----------+---------------------------------------------+-----------------------------------------------------------+
| Id | User | Host | db | Command | Time | State | Info |
+----------+-----------------+---------------------+------+---------+----------+---------------------------------------------+-----------------------------------------------------------+
| 2 | system user | | NULL | Connect | 15936302 | Waiting for dependent transaction to commit | NULL |
| 3 | system user | | | Connect | 15936302 | Waiting for table level lock | flush privileges |
| 4 | system user | | | Killed | 15936302 | Waiting for preceding transaction to commit | GRANT SHOW VIEW, EVENT, TRIGGER ON *.* TO 'myxxxx_bk'@'%' |
| 5 | system user | | NULL | Connect | 15936329 | Waiting for an event from Coordinator | NULL |
| 6 | system user | | NULL | Connect | 15936329 | Waiting for an event from Coordinator | NULL |
| 7 | system user | | NULL | Connect | 15936329 | Waiting for an event from Coordinator | NULL |
| 8 | system user | | NULL | Connect | 15936329 | Waiting for an event from Coordinator | NULL |
| 9 | system user | | NULL | Connect | 15936329 | Waiting for an event from Coordinator | NULL |
| 10 | system user | | NULL | Connect | 15936359 | Waiting for an event from Coordinator | NULL |
| 11 | event_scheduler | localhost | NULL | Daemon | 39435600 | Waiting on empty queue | NULL |
| 14424427 | system user | | NULL | Connect | 126323 | Waiting for master to send event | NULL |
| 14427736 | root | 10.xxx.xxx.61:42354 | NULL | Query | 117100 | Killing slave | stop slave |
| 14437090 | root | 10.xxx.xxx.45:33436 | NULL | Query | 91887 | checking permissions | stop slave |
| 14470519 | root | localhost | NULL | Query | 0 | starting | show processlist |
| 14470849 | mydata_ha | 10.xxx.xxx.45:32978 | NULL | Sleep | 0 | | NULL |
| 14470850 | mydata_ha | 10.xxx.xxx.45:32982 | NULL | Sleep | 0 | | NULL |
| 14470851 | mydata_ha | 10.xxx.xxx.45:32986 | NULL | Sleep | 0 | | NULL |
+----------+-----------------+---------------------+------+---------+----------+---------------------------------------------+-----------------------------------------------------------+
17 rows in set (0.00 sec)
mysql> select * from performance_schema.threads where PROCESSLIST_ID in (3,4)\G
*************************** 1. row ***************************
THREAD_ID: 52
NAME: thread/sql/slave_worker
TYPE: FOREGROUND
PROCESSLIST_ID: 3
PROCESSLIST_USER: NULL
PROCESSLIST_HOST: NULL
PROCESSLIST_DB: NULL
PROCESSLIST_COMMAND: Connect
PROCESSLIST_TIME: 15936422
PROCESSLIST_STATE: Waiting for table level lock
PROCESSLIST_INFO: flush privileges
PARENT_THREAD_ID: 51
ROLE: NULL
INSTRUMENTED: YES
HISTORY: YES
CONNECTION_TYPE: NULL
THREAD_OS_ID: 67865
*************************** 2. row ***************************
THREAD_ID: 53
NAME: thread/sql/slave_worker
TYPE: FOREGROUND
PROCESSLIST_ID: 4
PROCESSLIST_USER: NULL
PROCESSLIST_HOST: NULL
PROCESSLIST_DB: NULL
PROCESSLIST_COMMAND: Connect
PROCESSLIST_TIME: 15936422
PROCESSLIST_STATE: Waiting for preceding transaction to commit
PROCESSLIST_INFO: GRANT SHOW VIEW, EVENT, TRIGGER ON *.* TO 'myxxxx'@'%'
PARENT_THREAD_ID: 51
ROLE: NULL
INSTRUMENTED: YES
HISTORY: YES
CONNECTION_TYPE: NULL
THREAD_OS_ID: 67866
2 rows in set (0.00 sec)
Note:
Waiting for table level lock 常见于MyISAM 存储引擎, MySQL 5.7中的数据字典默认存储引擎, 8.0后为InnoDB. 当前正在执行的是flush privileges和GRANT SHOW VIEW, EVENT, TRIGGER xxx
使用mysqlbinlog 分析relay log
# at 311808069 #210825 0:00:00 server id 200922033 end_log_pos 311807921 CRC32 0x5710eff2 GTID last_committed=255033 sequence_number=255034 rbr_only=no SET @@SESSION.GTID_NEXT= '8690d6b3-fc9c-11ea-97c2-8446fe2f3274:1075955765'/*!*/; # at 311808134 #210825 0:00:00 server id 200922033 end_log_pos 311808008 CRC32 0xb7cb4c60 Query thread_id=9776243 exec_time=0 error_code=0 SET TIMESTAMP=1629820800/*!*/; flush privileges /*!*/; # at 311808221 #210825 0:00:00 server id 200922033 end_log_pos 311808073 CRC32 0x823bb746 GTID last_committed=255033 sequence_number=255035 rbr_only=no SET @@SESSION.GTID_NEXT= '8690d6b3-fc9c-11ea-97c2-8446fe2f3274:1075955766'/*!*/; # at 311808286 #210825 0:00:00 server id 200922033 end_log_pos 311808209 CRC32 0xbd6a96f9 Query thread_id=9776244 exec_time=0 error_code=0 SET TIMESTAMP=1629820800/*!*/; GRANT SHOW VIEW, EVENT, TRIGGER ON *.* TO 'myxxx_bk'@'%' /*!*/;
Note:
FLUSH PRIVILEGES 和 GRANT 有相同的last_committed number。
搜索匹配MySQL Bug #89229 .
Description: FLUSH PRIVILEGES commit its transaction and release the table locks before binlog. That means a following GRANT statement could be binlogged in the same commit window with the FLUSH PRIVILEGES statement. Thus the FLUSH PRIVILEGES and the GRANT can be applied parallel on slave. That may cause a deadlock if slave-preserve-commit-order is ON and GRANT gets the locks before FLUSH PRIVILEGES. The deadlock looks like: - GRANT (applied before FLUSH PRIVILEGES and the locks will be hold until commit) waiting for FLUSH PRIVILEGES to commit - FLUSH PRIVILEGES(blocked when acquiring the locks.) waiting for GRANT to release the locks.
对不起,这篇文章暂时关闭评论。