共计 4197 个字符,预计需要花费 11 分钟才能阅读完成。
场景 :
Crash 发生时的数据库版本: MySQL-5.7.12, 官方标注在 5.7.17 进行了 fix;
开启半同步的主从架构中, 从库开启半同步, 启动 / 重启 slave 线程导致 Master 实例 Crash;
结论 :
mysql bug, 附上 bug 单链接: https://bugs.mysql.com/bug.php?id=79865
问题描述 (摘抄 ):
Description: From 5.7,semi-sync add Ack_receiver thread for listening slave ack,which use select(). But select() can only listen socket fd between 1 and __FD_SET_SIZE(my os is 1024), when socket fd is bigger than __FD_SET_SIZE, select() has no effect, and can never get ack from slave,then semi-sync can't run normally.even more,select() use array store fds, when use FD_SET store fd which is bigger than __FD_SET_SIZE, array will overflow,so mysqld may crash。
主要问题就出在 tcp 连接的 select 方法, 通常, 操作系统通过宏 FD_SET_SIZE 来声明一个进程中 select 能操作的文件描述符的最大数据, 然而通常情况下, 这个 FD_SET_SIZE 的值仅为 1024;
实际上, 用 epoll 或者 poll 会比较少, select 貌似是用的很少的;
问题复现 :
准备一套 MySQL-5.7.12 的主从架构, 开启半同步:
为了能尽量简单的启用大量的文件描述符, 这里利用 MyISAM 分区表的 ” 特性 ”;
这时候在主库上连续执行 select 语句多次 (>5);
这时候看一下主库的文件描述符数量;
那么现在在开启半同步的从库上重启一下 slave, 同时 tail 一下主库的日志;
在重启线程几秒钟之后, 主库就发生了 Crash;
PS: 在测试的过程中, 多次执行了 select 语句, 然后确认主库的半同步状态也是 ON 的情况下迅速在从库上重启 slave, 基本是必现的;
PPS: MyISAM 表在 open 的时候会同时打开所有的分区文件, 所以能比较方便的模拟占用大量文件描述符的情景;
(MyISAM 分区表: http://blog.itpub.net/29510932/viewspace-2134679/)
PPPPPPPS: _(:з」∠)_
附上测试用的脚本与 Crash 的信息
- CREATE TABLE `myisam_t` (
- `id` int(11) DEFAULT NULL
- ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
- /*!50100 PARTITION BY HASH (id)
- PARTITIONS 2000 */
- 2017–04–28T22:10:00.731611+08:00 5092 [Note] Start binlog_dump to master_thread_id(5092) slave_server(13043), pos(, 4)
- 2017–04–28T22:10:01.648365+08:00 5092 [Note] Start semi–sync binlog_dump to slave (server_id: 13043), pos(, 4)
- *** buffer overflow detected ***: /usr/sbin/mysqld terminated
- ======= Backtrace: =========
- /lib/x86_64–linux–gnu/libc.so.6(+0x731af)[0x7fcdfc7981af]
- /lib/x86_64–linux–gnu/libc.so.6(__fortify_fail+0x37)[0x7fcdfc81dcf7]
- /lib/x86_64–linux–gnu/libc.so.6(+0xf6f10)[0x7fcdfc81bf10]
- /lib/x86_64–linux–gnu/libc.so.6(+0xf8c67)[0x7fcdfc81dc67]
- /usr/lib/mysql/plugin/semisync_master.so(_ZN12Ack_receiver17get_slave_socketsEP6fd_set+0x83)[0x7fcc73d4a493]
- /usr/lib/mysql/plugin/semisync_master.so(_ZN12Ack_receiver3runEv+0x603)[0x7fcc73d4aaf3]
- /usr/lib/mysql/plugin/semisync_master.so(ack_receive_handler+0x19)[0x7fcc73d4aba9]
- /usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0xe90784]
- /lib/x86_64–linux–gnu/libpthread.so.0(+0x80a4)[0x7fcdfdf650a4]
- /lib/x86_64–linux–gnu/libc.so.6(clone+0x6d)[0x7fcdfc80d87d]
- 14:10:01 UTC – mysqld got signal 6 ;
- This could be because you hit a bug. It is also possible that this binary
- or one of the libraries it was linked against is corrupt, improperly built,
- or misconfigured. This error can also be caused by malfunctioning hardware.
- Attempting to collect some information that could help diagnose the problem.
- As this is a crash and something is definitely wrong, the information
- collection process might fail.
- key_buffer_size=8388608
- read_buffer_size=131072
- max_used_connections=5
- max_threads=9999
- thread_count=8
- connection_count=2
- It is possible that mysqld could use up to
- key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 21899362 K bytes of memory
- Hope that‘s ok; if not, decrease some variables in the equation.
- Thread pointer: 0x0
- Attempting backtrace. You can use the following information to find out
- where mysqld died. If you see no messages after this, something went
- terribly wrong...
- stack_bottom = 0 thread_stack 0x40000
- /usr/sbin/mysqld(my_print_stacktrace+0x2c)[0xe77fec]
- /usr/sbin/mysqld(handle_fatal_signal+0x459)[0x7a7019]
- /lib/x86_64–linux–gnu/libpthread.so.0(+0xf8d0)[0x7fcdfdf6c8d0]
- /lib/x86_64–linux–gnu/libc.so.6(gsignal+0x37)[0x7fcdfc75a067]
- /lib/x86_64–linux–gnu/libc.so.6(abort+0x148)[0x7fcdfc75b448]
- /lib/x86_64–linux–gnu/libc.so.6(+0x731b4)[0x7fcdfc7981b4]
- /lib/x86_64–linux–gnu/libc.so.6(__fortify_fail+0x37)[0x7fcdfc81dcf7]
- /lib/x86_64–linux–gnu/libc.so.6(+0xf6f10)[0x7fcdfc81bf10]
- /lib/x86_64–linux–gnu/libc.so.6(+0xf8c67)[0x7fcdfc81dc67]
- /usr/lib/mysql/plugin/semisync_master.so(_ZN12Ack_receiver17get_slave_socketsEP6fd_set+0x83)[0x7fcc73d4a493]
- /usr/lib/mysql/plugin/semisync_master.so(_ZN12Ack_receiver3runEv+0x603)[0x7fcc73d4aaf3]
- /usr/lib/mysql/plugin/semisync_master.so(ack_receive_handler+0x19)[0x7fcc73d4aba9]
- /usr/sbin/mysqld(pfs_spawn_thread+0x1b4)[0xe90784]
- /lib/x86_64–linux–gnu/libpthread.so.0(+0x80a4)[0x7fcdfdf650a4]
- /lib/x86_64–linux–gnu/libc.so.6(clone+0x6d)[0x7fcdfc80d87d]
- The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
- information that should help you find out what is causing the crash.
本文永久更新链接地址 :http://www.linuxidc.com/Linux/2017-05/143386.htm