MySQL的COMMIT_ORDER模式下组提交分组实现与BUG案例源码剖析

MySQL的COMMIT_ORDER模式下组提交分组实现与BUG案例源码剖析

背景

自MySQL 5.7以来,组提交大面积应用,已经不断地得到优化。但网上有关组提交的实现机制,却还不够详细。故障多的时候,往往会发生一些模棱两可的揣测和猜疑。因此,笔者有了从自己的角度,去分析组提交实现的动机。

源码分析

以“last_committed”为入口,搜索MySQL 5.7.24源码,很快可以定位到关键类Transaction_dependency_tracker。梳理一下该类的虚函数实现以及调用位置,基本就理解了大部分实现。如下:

以此类为基础,再梳理调用过程,结合前面文章《MySQL的after_sync与after_commit性能之争源码剖析与情景测试》的分析,基本就可以梳理出具体的实现细节。

  • 基本实现

通过类Commit_order_trx_dependency_tracker的两个成员变量来基本实现:m_max_committed_transaction,保存已提交的最大事务号;m_transaction_counter,保存已prepare的最大事务号。

  • 获取last_committed与sequence_number值

如上图所示,实现该功能的是get_dependency函数,把last_committed与sequence_number的绝对值转换成相对值。而这个转换恰巧是在Binlog的flush阶段MYSQL_BIN_LOG::write_gtid函数中,生成GTID事件之前。

  • 更新m_max_committed_transaction值

update_max_committed函数的调用发生在commit队列处理函数MYSQL_BIN_LOG::process_commit_stage_queue中,遍历每个线程提交最开始的时候。

  • 线程获取m_max_committed_transaction值

对于非事务型语句(如DDL)来说,在binlog_prepare中获取last_committed绝对值。而对于事务型语句来说,则在提交的函数MYSQL_BIN_LOG::commit中,执行ordered_commit函数之前获取last_committed绝对值。

  • sequence_number值累加

step()函数的调用也是在Binlog的flush阶段,binlog_cache_data::flush函数中,不过是在MYSQL_BIN_LOG::write_gtid函数之前,即把绝对值转换成相对值之前。具体实现为,sequence_number每次加1。

梳理出来的流程如下所示:

案例分析

初看时,想当然的会以为,两个DDL进入了同一组导致了死锁。然后就会着重分析为什么在主库被分配了同一组,而在从库却又冲突了。从上图的分析可以了解,获取last_committed值是在执行ordered_commit函数之前。这个时候,线程所有持有的MDL锁并未释放,所以两者存在冲突,根本就不可能同时进入提交。但仔细回头一看,就会发现不对,两个语句根本就不是DDL,而是DCL呀!为了了解DCL中关于MDL锁的释放位置,就在测试环境抓取分别抓取它们的debug日志来分析。

  • flush privileges
...
// 第一部分
3343 T@5: | | | | | >MDL_context::release_transactional_locks
3344 T@5: | | | | | | >MDL_context::release_locks_stored_before
3345 T@5: | | | | | | <MDL_context::release_locks_stored_before 4439
3346 T@5: | | | | | | >MDL_context::release_locks_stored_before
3347 T@5: | | | | | | | info: found lock to release ticket=0x7f3c64012410
3348 T@5: | | | | | | | >MDL_context::release_lock
3349 T@5: | | | | | | | | enter: db=mysql name=proxies_priv
3350 T@5: | | | | | | | <MDL_context::release_lock 4399
3351 T@5: | | | | | | | info: found lock to release ticket=0x7f3c64012320
3352 T@5: | | | | | | | >MDL_context::release_lock
3353 T@5: | | | | | | | | enter: db=mysql name=db
3354 T@5: | | | | | | | <MDL_context::release_lock 4399
3355 T@5: | | | | | | | info: found lock to release ticket=0x7f3c6400e860
3356 T@5: | | | | | | | >MDL_context::release_lock
3357 T@5: | | | | | | | | enter: db=mysql name=user
3358 T@5: | | | | | | | <MDL_context::release_lock 4399
3359 T@5: | | | | | | <MDL_context::release_locks_stored_before 4447
3360 T@5: | | | | | <MDL_context::release_transactional_locks 4718
3361 T@5: | | | | <acl_reload 2148
...
// 第二部分
5073 T@5: | | | | | >MDL_context::release_transactional_locks
5074 T@5: | | | | | | >MDL_context::release_locks_stored_before
5075 T@5: | | | | | | <MDL_context::release_locks_stored_before 4439
5076 T@5: | | | | | | >MDL_context::release_locks_stored_before
5077 T@5: | | | | | | | info: found lock to release ticket=0x7f3c640308e0
5078 T@5: | | | | | | | >MDL_context::release_lock
5079 T@5: | | | | | | | | enter: db=mysql name=procs_priv
5080 T@5: | | | | | | | <MDL_context::release_lock 4399
5081 T@5: | | | | | | | info: found lock to release ticket=0x7f3c640273e0
5082 T@5: | | | | | | | >MDL_context::release_lock
5083 T@5: | | | | | | | | enter: db=mysql name=columns_priv
5084 T@5: | | | | | | | <MDL_context::release_lock 4399
5085 T@5: | | | | | | | info: found lock to release ticket=0x7f3c6400e860
5086 T@5: | | | | | | | >MDL_context::release_lock
5087 T@5: | | | | | | | | enter: db=mysql name=tables_priv
5088 T@5: | | | | | | | <MDL_context::release_lock 4399
5089 T@5: | | | | | | <MDL_context::release_locks_stored_before 4447
5090 T@5: | | | | | <MDL_context::release_transactional_locks 4718
5091 T@5: | | | | <grant_reload 2666
...
// 第三部分
5641 T@5: | | | | | >restore_backup_open_tables_state
5642 T@5: | | | | | | >MDL_context::rollback_to_savepoint
5643 T@5: | | | | | | | >MDL_context::release_locks_stored_before
5644 T@5: | | | | | | | <MDL_context::release_locks_stored_before 4439
5645 T@5: | | | | | | | >MDL_context::release_locks_stored_before
5646 T@5: | | | | | | | | info: found lock to release ticket=0x7f3c6400e860
5647 T@5: | | | | | | | | >MDL_context::release_lock
5648 T@5: | | | | | | | | | enter: db=mysql name=servers
5649 T@5: | | | | | | | | <MDL_context::release_lock 4399
5650 T@5: | | | | | | | <MDL_context::release_locks_stored_before 4447
5651 T@5: | | | | | | <MDL_context::rollback_to_savepoint 4700
5652 T@5: | | | | | <restore_backup_open_tables_state 3751
5653 T@5: | | | | | >my_hash_free
5654 T@5: | | | | | | enter: hash: 0x7f3c64002eb0
5655 T@5: | | | | | <my_hash_free 185
5656 T@5: | | | | | info: unlocking servers_cache
5657 T@5: | | | | <servers_reload 274
...
// 第四部分
5764 T@5: | | | | >trans_commit_stmt
5765 T@5: | | | | | debug: add_unsafe_rollback_flags: 0
5766 T@5: | | | | | >MYSQL_BIN_LOG::commit
5767 T@5: | | | | | | info: query='flush privileges'
...

以上分别截取了三部分:第一部分,对mysql.proxies_priv、mysql.db以及mysql.user释放MDL锁;第二部分,对mysql.procs_priv、mysql.columns_priv以及mysql.tables_priv释放MDL锁;第三部分,对mysql.servers释放MDL锁;第四部分,提交。由此,可以发现,MDL锁在进入提交之前就已经释放了。为了进一步确认,于是对函数MDL_context::acquire_lock设置断点,调试结果如下:

因此,可以发现,获取的都是SR锁。

  • grant privileges
...
// 第一部分
2477 T@4: | | | | | >trans_commit_stmt
2478 T@4: | | | | | | debug: add_unsafe_rollback_flags: 0
2479 T@4: | | | | | | >MYSQL_BIN_LOG::commit
2480 T@4: | | | | | | | info: query='grant process on *.* to 'slave'@'192.168.10.%''
...
2724 T@4: | | | | | | | | | >THD::enter_cond
2725 T@4: | | | | | | | | | | THD::enter_stage: 'Waiting for semi-sync ACK from slave' /data/mysql-5.7.24/plugin/semisync/semisync_master.cc:735
2726 T@4: | | | | | | | | | | >PROFILING::status_change
2727 T@4: | | | | | | | | | | <PROFILING::status_change 364
2728 T@4: | | | | | | | | | <THD::enter_cond 3147
...
2816 T@4: | | | | | <trans_commit_stmt 474
2817 T@4: | | | | | >trans_commit_implicit
...
2843 T@4: | | | | | <trans_commit_implicit 332
// 第二部分
2950 T@4: | | | | | >MDL_context::release_transactional_locks
2951 T@4: | | | | | | >MDL_context::release_locks_stored_before
2952 T@4: | | | | | | | info: found lock to release ticket=0x7f3c7000e9b0
2953 T@4: | | | | | | | >MDL_context::release_lock
2954 T@4: | | | | | | | | enter: db= name=
2955 T@4: | | | | | | | <MDL_context::release_lock 4399
2956 T@4: | | | | | | <MDL_context::release_locks_stored_before 4447
2957 T@4: | | | | | | >MDL_context::release_locks_stored_before
2958 T@4: | | | | | | | info: found lock to release ticket=0x7f3c7000e760
2959 T@4: | | | | | | | >MDL_context::release_lock
2960 T@4: | | | | | | | | enter: db=mysql name=db
2961 T@4: | | | | | | | <MDL_context::release_lock 4399
2962 T@4: | | | | | | | info: found lock to release ticket=0x7f3c7000ea10
2963 T@4: | | | | | | | >MDL_context::release_lock
2964 T@4: | | | | | | | | enter: db=mysql name=user
2965 T@4: | | | | | | | <MDL_context::release_lock 4399
2966 T@4: | | | | | | <MDL_context::release_locks_stored_before 4447
2967 T@4: | | | | | <MDL_context::release_transactional_locks 4718
...

以上分别截取了两部分:第一部分执行提交;第二部分,对mysql.db和mysql.user释放锁。依照上面的方式,同样进行调试,结果如下:

此时,可以发现,上述debug日志中的第一个锁为IX类型的GLOBAL锁,其它两个都是SW锁。

  • 死锁分析

从以上的分析,基本可以认定flush privilegesgrant privileges的执行过程有很大的不同,如果SW锁和SR锁不兼容的话,基本可以认定,前者先执行,后者可以并行,反之则不行。为此,进行一下测试(让语句处理ACK等待状态),先flush privilegesgrant privileges

mysql> show processlist;
+-----+------+-----------+------+---------+------+--------------------------------------+------------------------------------------------+
| Id  | User | Host      | db   | Command | Time | State                                | Info                                           |
+-----+------+-----------+------+---------+------+--------------------------------------+------------------------------------------------+
| 196 | root | localhost | NULL | Query   | 2449 | Waiting for semi-sync ACK from slave | flush privileges                               |
| 197 | root | localhost | NULL | Query   | 2430 | checking permissions                 | grant process on *.* to 'slave'@'192.168.10.%' |
| 198 | root | localhost | NULL | Query   |    0 | starting                             | show processlist                               |
+-----+------+-----------+------+---------+------+--------------------------------------+------------------------------------------------+
3 rows in set (0.00 sec)

线程197的堆栈信息如下:

反之,先执行grant privileges再flush privileges,

mysql> show processlist;
+-----+-------+--------------------+------+------------------+------+---------------------------------------------------------------+------------------------------------------------+
| Id  | User  | Host               | db   | Command          | Time | State                                                         | Info                                           |
+-----+-------+--------------------+------+------------------+------+---------------------------------------------------------------+------------------------------------------------+
| 196 | root  | localhost          | NULL | Query            |   12 | Waiting for table level lock                                  | flush privileges                               |
| 197 | root  | localhost          | NULL | Query            |   23 | Waiting for semi-sync ACK from slave                          | grant process on *.* to 'slave'@'192.168.10.%' |
| 198 | root  | localhost          | NULL | Query            |    0 | starting                                                      | show processlist                               |
| 199 | slave | 192.168.10.4:39721 | NULL | Binlog Dump GTID |  133 | Master has sent all binlog to slave; waiting for more updates | NULL                                           |
+-----+-------+--------------------+------+------------------+------+---------------------------------------------------------------+------------------------------------------------+
4 rows in set (0.00 sec)

线程196的堆栈信息如下:

通过以上的堆栈信息,可以惊讶地发现,线程196已经进入了mysql_lock_tables函数,即已经拿到了MDL锁,足以说明SR锁和SW锁是兼容的。通过上面的函数调用,不难发现,它们正在等待的,是表锁【InnoDB实现的表锁,存储引擎层只定义了表锁实现接口】。通过对比debug日志,不难发现,该锁的释放时间与MDL锁是紧贴在MDL锁前面的,因而以上的结论是成立的,只是对象由MDL锁变成了InnoDB表锁。

  • 故障复现

主库设置:

注意:binlog_group_commit_sync_delay最大值为100万微妙,即1s。
mysql> show variables like '%group_commit%';
+-----------------------------------------+---------+
| Variable_name                           | Value   |
+-----------------------------------------+---------+
| binlog_group_commit_sync_delay          | 1000000 |
| binlog_group_commit_sync_no_delay_count | 1000    |
+-----------------------------------------+---------+
2 rows in set (0.01 sec)

执行脚本:

在主库先执行flush privileges再执行grant privileges以保证进入同一组;然后线程1睡眠0.5s以保证flush privileges先获取sequence_number值,即在从库先提交;最后,在从库让先执行的线程后提交。
import threading
import time
import os


def thread_1():
    os.system('date +%M:%S')
    time.sleep(0.5)
    os.system("/usr/local/mysql/bin/mysql -uroot -ps3cret -e \"grant process on *.* to 'slave'@'192.168.10.%';\"")
    os.system('date +%M:%S')


def thread_2():
    os.system('date +%M:%S')
    os.system("/usr/local/mysql/bin/mysql -uroot -ps3cret -e 'flush privileges'")
    os.system('date +%M:%S')

t1 = threading.Thread(target=thread_2)
t1.start()
t2 = threading.Thread(target=thread_1)
t2.start()

从库现象:

  • 结论

flush privileges设计的提交前就释放锁的机制导致了死锁,依次类推,针对mysql.user表的DML语句与flush privileges配合,也有可能产生死锁。

总结

组提交的实现依赖于两阶段提交,因而对于非事务性语句来说,锁的释放可能会出现在事务提交前,因而可能会出现本来冲突的两个语句获得同一个last_committed,即被分配到了同一组,进而导致了并行复制死锁。

编辑于 2020-12-20 22:38