Last_IO_Error: Got fatal error 1236 from master when reading data from binary log

257次阅读

共计 4415 个字符，预计需要花费 12 分钟才能阅读完成。

在做最后一个 MySQL NBU 备份的时候，发现从库有问题，好奇的是怎么主从状态异常没有告警呢？先不管这么多了，处理了这个问题再完善告警内容。

一、错误信息

从库 show slave status \G 看到的错误信息如下：

Slave_IO_Running: No
Slave_SQL_Running: Yes
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: ‘Client requested master to start replication from impossible position; the first event ‘mysql-bin.000081’ at 480141113, the last event read from ‘./mysql-bin.000081’ at 4, the last byte read from ‘./mysql-bin.000081′ at 4.’

二、错误原因

这里看到从库的 io_thread 已经终止，错误编号是 1236，具体是由于读取主库的 binlog 日志位置（the first event ‘mysql-bin.000081’ at 480141113, the last event read from ‘./mysql-bin.000081’ at 4）不对导致主从失败建立失败。

三、解决方案

1. 检查从库状态以及读取、执行的 binlog 信息

mysql> show slave status \G
*************************** 1. row ***************************
Slave_IO_State:
Master_Host: xx.xx.xx.xx
Master_User: username
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.000081
Read_Master_Log_Pos: 480141113
Relay_Log_File: mysql9017-relay-bin.000163
Relay_Log_Pos: 480141259
Relay_Master_Log_File: mysql-bin.000081
Slave_IO_Running: No
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 480141113
Relay_Log_Space: 480141462
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 1236
Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: ‘Client requested master to start replication from impossible position; the first event ‘mysql-bin.000081’ at 480141113, the last event read from ‘./mysql-bin.000081’ at 4, the last byte read from ‘./mysql-bin.000081′ at 4.’
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 17
1 row in set (0.00 sec)

2. 查看主库的 binlog 内容

[backup]# mysqlbinlog mysql-bin.000081 >mysql-bin.log

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log

看到主库 binlog 日志 mysql-bin.000081 最大的 pos 为 480140557，但从库要读取的是 ’mysql-bin.000081′ at 480141113，显然从库要读的 pos 值比主库本身存在的 pos 值大，导致读取不到，进而失败。

可通过下面语句查看 binlog 的 pos 信息和日志内容
mysql> show binlog events in ‘mysql-bin.000081’ from 480140557 limit 10;
Empty set (0.04 sec)
3. 更改从库的同步位置，完成数据重新同步

主库：

mysqlbinlog mysql-bin.000082 |more

从库：

change master to master_host=’xx.xx.xx.xx’,master_user=’username’,master_port=3306,master_password=’password’,master_log_file=’mysql-bin.000082′,master_log_pos=4;

start slave;

show slave status \G

主从同步正常

Last_IO_Error: Got fatal error 1236 from master when reading data from binary log

4. 主库参数改进

导致这个原因很大程度上是由于主从在同步的过程中，主库异常断电，导致内存数据传输到从库但没有提交到 binlog 日志，即主库 sync_binlog 设置可能有问题，在主库检查参数设置：

果然其值是 0，不主动同步 binlog cache 的数据到磁盘，而依赖操作系统本身不定期把文件内容 flush 到磁盘。设为 1 最安全，在每个语句或事务后同步一次 binary log，即使在崩溃时也最多丢失一个语句或事务的日志，但因此也最慢。这里设置为 0，断电的情况下导致 binlog cache 数据丢失没有写入主库的 binlog，但 binlog 信息已同步至从库。这种情况容易导致主从数据不一致，所以即使恢复主从数据后，依旧要通过主从数据对比校验数据的一致性。

mysql> set global sync_binlog=1;
Query OK, 0 rows affected (0.00 sec)

更改配置文件 my.cnf 设置 sync_binlog=1

5. 主从数据校验

pt-table-checksum h=master_ipaddr,u=username,p=’password’,P=mysql_port –nocheck-binlog-format –recursion-method=hosts

pt-table-checksum h=master_ipaddr,u=username,p=’password’,P=mysql_port –nocheck-binlog-format –recursion-method=hosts
Checking if all tables can be checksummed …
Starting checksum …
TS ERRORS DIFFS ROWS CHUNKS SKIPPED TIME TABLE
08-03T17:49:29 0 0 595 1 0 0.186 user.hole

其中 –recursion-method 有几种方式查看从库信息，这里采用的是 hosts 方式，需要在从库加入如下参数，方可在主库执行 show slave hosts 查看从库的信息

report_host=slave_ip

report_port=slave_port

METHOD USES
=========== =============================================
processlist SHOW PROCESSLIST
hosts SHOW SLAVE HOSTS
cluster SHOW STATUS LIKE ‘wsrep\_incoming\_addresses’
dsn=DSN DSNs from a table
none Do not find slaves

6.innodb_flush_log_at_trx_commit 参数扩展

innodb_flush_log_at_trx_commit 参数指定了 InnoDB 在事务提交后的日志写入频率。这么说其实并不严谨，且看其不同取值的意义和表现。

当 innodb_flush_log_at_trx_commit 取值为 0 的时候，log buffer 会每秒写入到日志文件并刷写（flush）到磁盘。但每次事务提交不会有任何影响，也就是 log buffer 的刷写操作和事务提交操作没有关系。在这种情况下，MySQL 性能最好，但如果 mysqld 进程崩溃，通常会导致最后 1s 的日志丢失。

当取值为 1 时，每次事务提交时，log buffer 会被写入到日志文件并刷写到磁盘。这也是默认值。这是最安全的配置，但由于每次事务都需要进行磁盘 I /O，所以也最慢。

当取值为 2 时，每次事务提交会写入日志文件，但并不会立即刷写到磁盘，日志文件会每秒刷写一次到磁盘。这时如果 mysqld 进程崩溃，由于日志已经写入到系统缓存，所以并不会丢失数据；在操作系统崩溃的情况下，通常会导致最后 1s 的日志丢失。

：

正文完

星哥玩云-微信公众号