MySQL Seconds_behind_master反复横跳？-常见问题-开封晨展科技有限公司，开封网站制作，网站建设，小程序，APP开发

MySQL Seconds_behind_master反复横跳？

常见问题发布者：ou3377 2021-12-24 08:43 访问量：812

Seconds_behind_master反复横跳？

01 问题背景

今天在线上遇到了一个MySQL的问题，这里记录一下。

场景：

1、监控报警某个业务的从库有延迟

2、show slave status查看seconds_behind_master值反复在0、500、0、500、0、500之间跳动。

seconds_behind_master下面简称SBM。

02 排查过程

这种问题，更多的是先从官方文档上去查看一些蛛丝马迹。

SBM的具体意思，想必大家都知道，它是指主从复制的SQL线程落后于主库binlog的时间（也就是从库的relay-log中的时间）。

在官方文档上，找到一句话：

It is also possible that transient values for Seconds_Behind_Source may not reflect the situation accurately. When the replication SQL (applier) thread has caught up on I/O, Seconds_Behind_Source displays 0; but when the replication I/O (receiver) thread is still queuing up a new event, Seconds_Behind_Source may show a large value until the replication applier thread finishes executing the new event. This is especially likely when the events have old timestamps; in such cases, if you execute SHOW REPLICA STATUS several times in a relatively short period, you may see this value change back and forth repeatedly between 0 and a relatively large value.

链接：https://dev.mysql.com/doc/refman/8.0/en/replication-administration-status.html

这不就是我们遇到的情况嘛。

这段话翻译过来就是：在一些场景下，SBM的瞬时值并不能反映准确的情况。当复制的SQL线程追赶上IO线程的时候，这个值显示为0；当复制IO在等待新的事件的时候，SBM值显示为一个较大的值，直到复制线程执行完了这个新的event。而这种情况在某个事件具有旧的时间戳的时候，经常容易发生。在这种情况下，你就会看到SBM值在0和一个较大的值之间反复横跳。

有了上述的理论基础，我们来看我们的真实情况：

先来看主库上真实的binlog位置：

show master statusG
*************************** 1. row ***************************
             File: mysql-bin.004718
         Position: 841616814
     Binlog_Do_DB:
Binlog_Ignore_DB:
Executed_Gtid_Set:
1 row in set (0.00 sec)

可以看到，主库的binlog是4718这个号。

再来看下从库复制的主库位置：

show slave statusG
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.xx.xx.xx
                  Master_User: replica
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.004717
          Read_Master_Log_Pos: 353151416
               Relay_Log_File: relay-bin.002983
                Relay_Log_Pos: 353151460
        Relay_Master_Log_File: mysql-bin.004717
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
          Exec_Master_Log_Pos: 353151301
              Relay_Log_Space: 353151781

可以看到，从库复制的主库binlog编号是4717.

当前这种情况，就是从库的IO线程比较慢，复制的主库的binlog不是最新的binlog，而是上一个binlog，自然而然就带有了这个老旧的时间戳。所以从库的IO线程在排队等待某个事务中的新事件的时候，就使得SBM值变为一个比较大的数字，（500s）。然后等到复制SQL线程追上IO线程的时候，这个SBM的值又变成了0.

以上猜测，在实际分析binlog的时候，也被证明了。当前的从库的relay log中，确实有很多超级大的insert操作，而且从库的relay log中的时间戳，记录的时间基本都是当前从库时间9分钟以前，恰好500s左右。

03 几点说明

关于SBM值，还有几点需要说明：

1、在从库忙的时候，SBM记录的是从库时间戳和来自主库的binlog中的时间戳之间的差异

2、当从库空闲时，如果I/O和SQL线程状态双Yes，SBM=0，如果有任意一个线程状态不为Yes，SBM=NULL

3、假设主从服务器时间戳不一致，如果从库复制线程启动之后，没有做过任何时间变更，那么SBM的值也可以正常计算，但是如果修改了从库或者主库的时间，则可能导致时钟偏移，从而导致这个计算值不可靠

4、如果从库上有人为写入新数据，则可能导致SBM的值随机波动。

5、在网络条件较差的环境下，如果IO 线程延时严重，即使SBM的值为0，也不能说明主从之间没有延迟。

关键字：MySQL

http://m.chenzhankj.com/cjwt/852.html