首页 » MySQL/TiDB/GoldenDB » GoldenDB 分布式数据字典不一致修复ERROR 3508
GoldenDB 分布式数据字典不一致修复ERROR 3508
GoldenDB作为分布式数据库,数据字典和元数据损坏属于严重故障,需谨慎处理。如果是主备架构,可能优先考虑切换主备,用备库恢复,这样可以减少停机时间。如果没有备库,可能需要从备份恢复,比如全备加增量备份,恢复到新实例,再验证数据。近期在做partition table的add 维护时,遇到了一个案例,版本 GoldenDB-ALL-DBCLUSTERV6.1.03.07SP5.r4895784, 报错信息如下
ERROR 3508 (HY000): Dictionary object id (xxx) does not exist.(Proxyid:1 Clusterid:1 Groupid:xx DBid:xx);part of DDL may succeed,please manual recovery!
从错误信息中可以看到提示部份DDL成功,看DBID 及分片groupid,和对象object ID
DDL 操作
mysql> alter table anbob.tab_xxx reorganize partition PART_300_202503L into (partition PART_300_202502L values less than (300,20250300000000),partition PART_300_202503L values less than (300,20250400000000)); ERROR 3508 (HY000): Dictionary object id (1018291) does not exist.(Proxyid:1 Clusterid:1 Groupid:5 DBid:17);part of DDL may succeed,please manual recovery!
Note:
拆分区失败. 但部分成功
验证
确认是否所有分片对象都无法查看metadata
mysql> show create table anbob.tab_xxx storagedb g5 -> \G ERROR 1030 (HY000): Got error 122 - 'Internal (unspecified) error in handler' from storage engine(Proxyid:1 Clusterid:1 Groupid:5 DBid:17) mysql> show table status like 'tab_xxx'\G *************************** 1. row *************************** NAME: tab_xxx ENGINE: InnoDB VERSION: 10 ROW_FORMAT: Dynamic ROWS: 0 AVG_ROW_LENGTH: 0 DATA_LENGTH: 0 MAX_DATA_LENGTH: 0 INDEX_LENGTH: 9959768064 DATA_FREE: 0 AUTO_INCREMENT: 0 CREATE_TIME: 2025-01-22 00:30:57 UPDATE_TIME: 0000-00-00 00:00:00 CHECK_TIME: NULL COLLATION: utf8mb4_bin CHECKSUM: NULL CREATE_OPTIONS: DUPLICATE="Y" partitioned COMMENT: Got error 122 - 'Internal (unspecified) error in handler' from storage engine 1 row in set (0.05 sec) mysql> show create table anbob.tab_xxx storagedb g1; ERROR 1030 (HY000): Got error 122 - 'Internal (unspecified) error in handler' from storage engine(Proxyid:1 Clusterid:1 Groupid:1 DBid:1) mysql> show create table anbob.tab_xxx storagedb g3; -- fine
Note:
确认部分分片metadata不一致,但是比如案例中的分片3 正常,正常的DML查询也可以继续,但是因为某分分片DDL不一致所有,该表后续的DDL均不允许操作,直到手动恢复。
解决思路
因为有每个分片有主从架构,从节点确认DDL查询不报错,且原数据为最新,该问题还不确认bug原因,因此对于这个案例可以做报错分片节点的主从切换。然后对报错原主节点做分片级恢复即可。
目前这篇文章还没有评论(Rss)