MySQL8中ALGORITHM=INSTANT带来的风险小结
近日因为一个MySQL备份使用XtraBackup失败的问题,让我认识了一下MySQL的DDL中ALGORITHM=INSTANT算法,发现该问题在>=8.0.29引起的风险并不小这里整理一下,Oracle其实也一样,在不同的版本中有可能引入一些新特性同时,也可能引入一些风险bug, 如ACS、延时段创建、adaptive_log_file_sync 等在新版本中默认启用,目前发现至少在8.0.32之前因为ALGORITHM=INSTANT可能发生page损坏、实例crash、备份失败。
MySQL中在像增加字段等DDL操作时Alter table命令有一个ALGORITHM=选项,有默认值也可以“显示”指定, 值有COPY、Inplace、Instant. 顾名思义对数据的更改方式。之前在《“alter table ” modify column in Oracle、MySQL、PostGreSQL(数据库比较系列十三)》笔记有记录,在MySQL发现修改列长度居然还会导致MySQL数据重构,让我一个Oracle DBA有点吃惊,也许Oracle的方式有点像Instant, 不同的版本对DDL的支持略有不同可参考官方文档。
为什么影响MySQL 8.0.29 – 8.0.31?
DDL即时Instant 功能自MySQL 8.0.12版本引入, 但也不是默认ALGORITHM, 而且只能顺序加列, 仅支持在最后添加列,而不支持在现有列的中间添加列。这和oracle比较灵活,如oracle只能在尾部增加列。而mysql中可以指定增加列的位置,如:
–加在开始
alter table 表名 add column 字段名 varchar(255) FIRST;
— 加在某字段后
alter table 表名 add column 字段名 varchar(255) After 现有字段名;
在MySQL 8.0.29 之前,在线 DDL 操作中即时添加列只能添加在表的最后一列,对于在某个列位置快速添加列很不方便,从MySQL 8.0.29 开始扩展了对 ALTER TABLE … ALGORITHM=INSTANT 的支持:用户可以在表的任何位置即时添加列、即时删除列、添加列时评估行大小限制。该算法在8.0.12上成为alter table add column的默认算法, 从MySQL 8.0.29开始ALGORITHM=INSTANT该选项是默认值,如果未指定算法,ALGORITHM=INSTANT对于ALTER TABLE ADD/DROP COLUMN所有语句都使用此算法,这个改变导致redo格式级联的改变, 因为这个新特性,InnoDB redo log 格式对于所有 DML 操作都发生了变化。MySQL 官方 8.0.29 版本开始为了更多 DDL 支持使用 instant 算法引入了一种设计缺陷,此缺陷会导致instant add/drop columns 的表数据库数据块损坏。 从每次即时添加或删除列都会创建一个新的行版本。 MySQL 8.0.29后在 INFORMATION_SCHEMA.INNODB_TABLES 表中添加了一个新的列 TOTAL_ROW_VERSIONS 列来跟踪行版本的数量,每个表最多允许 64 行版本。
InnoDB now supports ALTER TABLE … DROP COLUMN operations using ALGORITHM=INSTANT.
Operations that support ALGORITHM=INSTANT only modify metadata in the data dictionary. Table data is unaffected, making the operations instantaneous. If not specified explicitly, ALGORITHM=INSTANT is used by default by DDL operations that support it.
Prior to MySQL 8.0.29, an instantly added column could only be added as the last column of the table. From MySQL 8.0.29, an instantly added column can be added to any position in the table.
Instantly added or dropped columns create a new row version. Up to 64 row versions are permitted. A new TOTAL_ROW_VERSIONS column was added to the INFORMATION_SCHEMA.INNODB_TABLES table to track the number of row versions.
https://dev.mysql.com/doc/relnotes/mysql/8.0/en/news-8-0-29.html
如出现坏页是mysql日志
.. InnoDB: about forcing recovery. 13:03:07 UTC - mysqld got signal 6 ; Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware. Thread pointer: 0x7fbcaca21730 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 7fbcf75f9c30 thread_stack 0x100000 /usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x3d) [0x1fd31ed] /usr/sbin/mysqld(print_fatal_signal(int)+0x2df) [0xee954f] /usr/sbin/mysqld(my_server_abort()+0x7e) [0xee969e] /usr/sbin/mysqld(my_abort()+0xa) [0x1fcd4ca] /usr/sbin/mysqld(ut_dbg_assertion_failed(char const*, char const*, unsigned long)+0x31f) [0x226d14f] /usr/sbin/mysqld() [0x23939e6] /usr/sbin/mysqld(Fil_shard::do_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*)+0xc44) [0x23a5e64] /usr/sbin/mysqld(fil_io(IORequest const&, bool, page_id_t const&, page_size_t const&, unsigned long, unsigned long, void*, void*)+0x57) [0x23a5ee7] /usr/sbin/mysqld(buf_read_page_low(dberr_t*, bool, unsigned long, unsigned long, page_id_t const&, page_size_t const&, bool)+0x176) [0x22ea0f6] /usr/sbin/mysqld(buf_read_page(page_id_t const&, page_size_t const&)+0x46) [0x22ea576] .. /lib64/libpthread.so.0(+0x7e65) [0x7fbf77365e65] /lib64/libc.so.6(clone+0x6d) [0x7fbf7572288d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (7fbcaca38b40): ALTER TABLE `xxx`.`` DROP COLUMN `xxx_NAME` Connection ID (thread ID): 33 Status: NOT_KILLED
另外一种场景当一个执行 Update SQL 的事务回滚时,由于原记录头信息中的 REC_INFO_VERSION_FLAG 标志位错误,导致计算出来的原记录长度过大,有一定几率MySQL 认为数据页的剩余空间不能容纳这条记录,就会触发断言bug导致 MySQL 宕机。
第三种场景由于 XtraBackup 无法处理MySQL社区版生成8.0.29引入的损坏的 redo log,MySQL 8.0.29/30/31中的问题使此版本无法安全地进行备份。 因此,如果 XtraBackup 8.0.29/30/31 版本检测到具有 INSTANT ADD/DROP 列的表,它将不会进行备份,并且会生成错误信息,列出受影响表的列表并提供将它们转换为常规表的说明。 如下
2022-07-01T15:18:35.127689+05:30 0 [ERROR] [MY-011825] [Xtrabackup] Found tables with row versions due to INSTANT ADD/DROP columns 2022-07-01T15:18:35.127714+05:30 0 [ERROR] [MY-011825] [Xtrabackup] This feature is not stable and will cause backup corruption. 2022-07-01T15:18:35.127723+05:30 0 [ERROR] [MY-011825] [Xtrabackup] Tables found: 2022-07-01T15:18:35.127730+05:30 0 [ERROR] [MY-011825] [Xtrabackup] test/t1 ... 2022-07-01T15:18:35.127752+05:30 0 [ERROR] [MY-011825] [Xtrabackup] Please run OPTIMIZE TABLE or ALTER TABLE ALGORITHM=COPY on all listed tables to fix this issue.
Percona XtraBackup 8.0.32-25允许MySQL 8.0.32备份中的即时列(注意XtraBackup和db server同时升级到32以后)。 如果您已经有此类表(以下有关如何查找此类表的信息),建议用户在进行备份之前对这些表进行操作,OPTIMIZE TABLE xx,xx;
这样的表通常有TOTAL_ROW_VERSIONS > 0,但并不是所有与此匹配的表都会损坏或不可读。在>= 8.0.29上检查这个查询:
SELECT NAME, TOTAL_ROW_VERSIONS FROM INFORMATION_SCHEMA.INNODB_TABLES WHERE TOTAL_ROW_VERSIONS > 0 order by TOTAL_ROW_VERSIONS DESC;
如果此查询显示空结果集,则没有问题。Percona Xtrabackup备份您的MySQL 8.0.29服务器。如果结果是表列表,请在进行备份之前在列表上运行优化表。
https://docs.percona.com/percona-xtrabackup/8.0/em/instant.html
已知的bug
Bug 34233264 – Assertion failure: rec.h:834:!rec_new_is_versioned(rec) || !rec_get_instant_flag_new(rec
A note about Bug 34233264: This is actual on-disk corruption of pages. Two flags (VERSION & INSTANT) should never be in set state together. Yet they were able to be set prior to this fix.
Bug 34243694 – DROP COLUMN with ALGORITHM=INSTANT causing sporadic table corruption
A note about Bug 34243694: INSTANT DDL didn’t actually cause corruption of existing rows on disk. The bug is that when 8.0.29 tries reading rows that were inserted by earlier versions after they did an INSTANT DDL prior to upgrading, the new rows were misinterpreted due to an error in calculating nullable bits.
The symptoms are crashes and/or corruption, however physical disk data is correctly readable by 8.0.30 which has the fix.
Bug 34181432 – Assert: rem0lrec.h:314:len == rec_get_nth_field_size_low(rec, n – n_drop)
Bug 34181432 Fixed as of the upcoming 8.0.30 release:
Only the physical position of instantly added columns was logged, which was not sufficient for index recovery. The logical position of columns was
also required.
Bug 34302445 – Assertion failure: dict0dd.cc:1693:dd_column_is_dropped(old_col)
Bug 34302445 is Fixed as of the upcoming 8.0.30 release:
A TRUNCATE TABLE operation failed to remove data dictionary entries for columns that were dropped using ALGORITHM=INSTANT.
Bug 34488482 – Assertion failure: fil0fil.cc:7533 thread 140697559873280
Bug 34488482 is Fixed as of the upcoming 8.0.31 release:
After upgrading to a release that supports row versions for columns added or dropped using ALGORITHM-INSTANT, a failure occurred during an instant ADD COLUMN operation on a table with a nullable column and an instantly added column.
总结
ALGORITHM=INSTANT从8.0.29开始为默认算法支持更多的DDL,并且因为redo格式一种涉及缺陷, 容易导致数据库产生块坏, 导致MySQL挂机,影响范围社区版MYSQL 8.0.29-8.0.31, 因为Percona XtraBackup无法解决MySQL社区版的问题,虽然在Percona MYSQL分支中都已修复,所以备份for percona任何版本均正常,我觉的Percona为了保证备份的安全,所以对于MySQL社区版修复之前,使用XtraBackup备份时会提示发现这类表让做optimizer table操作,这逻辑也合情合理,比备份了发现无法恢复更负责, 而社区版MySQL8.0.32修复问题后Percona XtraBackup-8.0.32-25版本也支持了该备份,不再提示。
防止该问题的出现,应用使用如下方式“显示”的指定ALGORITHM参数值INPLACE或copy,则可以正常DDL操作,MySQL实例不会报错, 如果已存在TOTAL_ROW_VERSIONS > 0 的表可以optimize table或导出重导入。
目前这篇文章有6条评论(Rss)评论关闭。