共计 14995 个字符,预计需要花费 38 分钟才能阅读完成。
MHA 的简要介绍
- MHA 全称 Master High Availability,也就是主节点的高可用,是目前比较成熟的 MySQL 高可用解决方案。它的主要功能主要是通过一个 manager 节点来监控主节点和从节点的状况,并会在主节点发生故障的时候,自动将一个数据最贴近 Master 的从节点转化成主节点。实现自动的故障转移。
- MHA 的变成语言是 Perl,需要安装一些软件包来进行编译操作,但是总体的编译过程十分地简单。
实验拓扑
要点以及基础知识
- MHA 的组件中主要有两个,一个是 Manager 节点组件。类似于一个监督者。
Node 节点组件则是安装于数据库节点,其中一个作为 Master。 - MHA 在主节点发生故障时需要进行主节点自动切换,所以必不可少地需要管理员权限。所以多个节点之间需要基于 ssh 秘钥认证。
- MHA 的主要配置在于 manager。
主机名 | 主机地址 | 角色 |
---|---|---|
node1 | 192.168.2.201 | Master 节点,安装 node 组件 |
node2 | 192.168.2.202 | Slave 节点,安装 node 组件 |
node3 | 192.168.2.203 | Slave 节点,安装 node 组件 |
node4 | 192.168.2.204 | 安装 manager 组件 |
本文使用 CentOS7.1,数据库:MariaDB-5.5.50
关于半同步复制的详细配置,可以参考我的上一篇文章。由于篇幅问题,这里主要讲如何安装配置和使用 MHA 组件。
因为数据库版本是 MariaDB-5.5.50,所以选择编译在 codegoole 上面的 mha4mysql-0.56
注意:本文关闭了 selinux,以及 iptables。
Perl 编译安装
最新版 MHA 下载地址:
mha4mysql-manager
mha4mysql-node题外话
本来代码是在 codegoogle 上面进行托管的,甚至连一些介绍的主页也是在 codegoogle 上面的。
但是由于 github 的出现,很多软件都转移到 github 上边了。codegoole 上面的 rpm 包很多都已经失效。
因为来历不明的 rpm 不敢安装在实际环境中,所以选择使用 perl 编译安装。
(1)在每一个节点上面进行编译环境的安装
yum -y install perl-DBD-MySQL perl-Config-Tiny perl-Log-Dispatch perl-Parallel-ForkManager perl-Config-IniFiles ncftp perl-Params-Validate perl-CPAN perl-Test-Mock-LWP.noarch perl-LWP-Authen-Negotiate.noarch perl-devel perl-ExtUtils-CBuilder perl-ExtUtils-MakeMaker
(2)在 node4 中安装 manager 组件 a. 使用 make Makefile.PL 检查编译环境,功能类似于./configure
其实 node1~node3 这三个配置了半同步复制的数据库节点安装的是 node 组件,但是也是执行这两步。
一般都不会出错。而且 node 节点不用额外配置,所以不做重复演示了。
[root@node4 mha4mysql-manager-0.56]# perl Makefile.PL
*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- DBI ...loaded. (1.627)
- DBD::mysql ...loaded. (4.023)
- Time::HiRes ...loaded. (1.9725)
- Config::Tiny ...loaded. (2.14)
- Log::Dispatch ...loaded. (2.41)
- Parallel::ForkManager ...loaded. (1.05)
- MHA::NodeConst ...loaded. (0.56)
*** Module::AutoInstall configuration finished.
Writing Makefile for mha4mysql::manager
b. 使用 make&&make install 安装
[root@node4 mha4mysql-manager-0.56]# make&&make install
Skip blib/lib/MHA/ManagerUtil.pm (unchanged)
Skip blib/lib/MHA/Config.pm (unchanged)
Skip blib/lib/MHA/HealthCheck.pm (unchanged)
Skip blib/lib/MHA/ManagerConst.pm (unchanged)
Skip blib/lib/MHA/ServerManager.pm (unchanged)
Skip blib/lib/MHA/ManagerAdmin.pm (unchanged)
Skip blib/lib/MHA/FileStatus.pm (unchanged)
Skip blib/lib/MHA/ManagerAdminWrapper.pm (unchanged)
Skip blib/lib/MHA/MasterFailover.pm (unchanged)
Skip blib/lib/MHA/MasterRotate.pm (unchanged)
Skip blib/lib/MHA/MasterMonitor.pm (unchanged)
Skip blib/lib/MHA/SSHCheck.pm (unchanged)
Skip blib/lib/MHA/Server.pm (unchanged)
Skip blib/lib/MHA/DBHelper.pm (unchanged)
cp bin/masterha_stop blib/script/masterha_stop
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_stop
cp bin/masterha_conf_host blib/script/masterha_conf_host
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_conf_host
cp bin/masterha_check_repl blib/script/masterha_check_repl
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_repl
cp bin/masterha_check_status blib/script/masterha_check_status
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_status
cp bin/masterha_master_monitor blib/script/masterha_master_monitor
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_master_monitor
cp bin/masterha_check_ssh blib/script/masterha_check_ssh
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_check_ssh
cp bin/masterha_master_switch blib/script/masterha_master_switch
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_master_switch
cp bin/masterha_secondary_check blib/script/masterha_secondary_check
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_secondary_check
cp bin/masterha_manager blib/script/masterha_manager
/usr/bin/perl "-Iinc" -MExtUtils::MY -e 'MY->fixin(shift)' -- blib/script/masterha_manager
Manifying blib/man1/masterha_stop.1
Manifying blib/man1/masterha_conf_host.1
Manifying blib/man1/masterha_check_repl.1
Manifying blib/man1/masterha_check_status.1
Manifying blib/man1/masterha_master_monitor.1
Manifying blib/man1/masterha_check_ssh.1
Manifying blib/man1/masterha_master_switch.1
Manifying blib/man1/masterha_secondary_check.1
Manifying blib/man1/masterha_manager.1
Appending installation info to /usr/lib64/perl5/perllocal.pod
数据库节点的配置
半同步复制 Master 节点 Node1
的 MariaDB 配置文件
[mysqld]
datadir=/var/lib/mysql
socket=/var/lib/mysql/mysql.sock
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://Fedoraproject.org/wiki/Systemd
innodb_file_per_table = 1
skip_name_resolve = 1
log_bin = Master-log
log_bin_index = 1
server_id = 1
relay_log=relay-log
relay_log_purge=0
#skip-grant-tables
#skip-networking
[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
#
# include all files from the config directory
#
!includedir /etc/my.cnf.d
这里需要注意的是,
半同步复制主节点和从节点都要启动了二进制日志 log_bin = Master-log,中继日志 relay_log=relay-log
而且这里关闭了中继日志的修剪功能 relay_log_purge=0。因为这由 MHA 完成。
半同步复制 Slave 节点 Node2
和node3
的 MariaDB 配置文件
[mysqld]
datadir=/var/lib/mysql/
socket=/var/lib/mysql/mysql.sock
log_bin=Master-bin
# Disabling symbolic-links is recommended to prevent assorted security risks
symbolic-links=0
# Settings user and group are ignored when systemd is used.
# If you need to run mysqld under a different user or group,
# customize your systemd unit file for mariadb according to the
# instructions in http://fedoraproject.org/wiki/Systemd
skip_name_resolve=true
innodb_file_per_table=ture
server_id = 2
log_bin=bin_log
relay_log=relay-log
read_only = 1
relay_log_purge=0
[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
#
# include all files from the config directory
#
!includedir /etc/my.cnf.d
这里比 Master 节点多一个 read_only=1
假如 Slave 节点被提升为 Master 节点的话,MHA 会自动将这个 read_only= 1 去掉
并且会将修改其他 Slave 节点指向新的主节点,可以用 show slave status\G 查看。
Manager 节点配置
(1)复制默认文件作为模板,并清空默认配置
cp /etc/masterha/masterha_default.cnf /etc/masterha/app1.cnf
> /etc/masterha/masterha_default.cnf
(2)配置 /etc/masterha/app1.cnf,用于启动 manager 进程的时候指定。MHA 的一个 manager 节点可以通过启动多个进程来监控多个 MHA 集群,所以使用 app1,app2 的方式。
[server default]
#manager_workdir=/var/log/masterha/app1
#manager_log=/var/log/masterha/app1/manager.log
user=root
password=123456789
manager_workdir=/data/masterha/app1
manager_log=/data/masterha/app1/manager.log
remote_workdir=/data/masterha/app1
ssh_user=root
repl_user=repuser
repl_password=repuser
ping_interval=1
[server1]
hostname=node1
candidate_master=1
[server2]
hostname=node2
candidate_master=1
[server3]
hostname=node3
这里的 user 和 password 指的是数据库管理员的账号密码
repl_user 和 repl_password 是具有复制权限的用户和密码
ssh_user=root 是 ssh 的账户,由于是秘钥认证,并不需要密码
配置文件中,hostname=node1 是因为主机可以使用 node1 访问到该主机,这里也可以用 ip 地址。
(3)创建配置文件中 manager_workdir 的工作路径
mkdir /data/masterha/app1/
利用 MHA 的工具测试环境是否正常
(1)测试 ssh 是否连接正常
[root@node4 mha4mysql-manager-0.56]# masterha_check_ssh --conf=/etc/masterha/app1.cnf
Thu Nov 10 22:59:03 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Nov 10 22:59:03 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Nov 10 22:59:03 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Nov 10 22:59:03 2016 - [info] Starting SSH connection tests..
Thu Nov 10 22:59:04 2016 - [debug]
Thu Nov 10 22:59:03 2016 - [debug] Connecting via SSH from root@node1(192.168.2.201:22) to root@node2(192.168.2.202:22)..
Thu Nov 10 22:59:03 2016 - [debug] ok.
Thu Nov 10 22:59:03 2016 - [debug] Connecting via SSH from root@node1(192.168.2.201:22) to root@node3(192.168.2.203:22)..
Thu Nov 10 22:59:03 2016 - [debug] ok.
Thu Nov 10 22:59:04 2016 - [debug]
Thu Nov 10 22:59:03 2016 - [debug] Connecting via SSH from root@node2(192.168.2.202:22) to root@node1(192.168.2.201:22)..
Thu Nov 10 22:59:04 2016 - [debug] ok.
Thu Nov 10 22:59:04 2016 - [debug] Connecting via SSH from root@node2(192.168.2.202:22) to root@node3(192.168.2.203:22)..
Thu Nov 10 22:59:04 2016 - [debug] ok.
Thu Nov 10 22:59:05 2016 - [debug]
Thu Nov 10 22:59:04 2016 - [debug] Connecting via SSH from root@node3(192.168.2.203:22) to root@node1(192.168.2.201:22)..
Thu Nov 10 22:59:04 2016 - [debug] ok.
Thu Nov 10 22:59:04 2016 - [debug] Connecting via SSH from root@node3(192.168.2.203:22) to root@node2(192.168.2.202:22)..
Thu Nov 10 22:59:05 2016 - [debug] ok.
Thu Nov 10 22:59:05 2016 - [info] All SSH connection tests passed successfully.
这么多输出信息,其实只看最后一句就知道 ssh 是否正常了
这里需要注意的是这里指定了刚才配置的 app1.
(2)测试复制功能是否正常
[root@node4 mha4mysql-manager-0.56]# masterha_check_repl --conf=/etc/masterha/app1.cnf
Thu Nov 10 23:07:35 2016 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Thu Nov 10 23:07:35 2016 - [info] Reading application default configuration from /etc/masterha/app1.cnf..
Thu Nov 10 23:07:35 2016 - [info] Reading server configuration from /etc/masterha/app1.cnf..
Thu Nov 10 23:07:35 2016 - [info] MHA::MasterMonitor version 0.56.
Thu Nov 10 23:07:35 2016 - [info] GTID failover mode = 0
Thu Nov 10 23:07:35 2016 - [info] Dead Servers:
Thu Nov 10 23:07:35 2016 - [info] Alive Servers:
Thu Nov 10 23:07:35 2016 - [info] node1(192.168.2.201:3306)
Thu Nov 10 23:07:35 2016 - [info] node2(192.168.2.202:3306)
Thu Nov 10 23:07:35 2016 - [info] node3(192.168.2.203:3306)
Thu Nov 10 23:07:35 2016 - [info] Alive Slaves:
Thu Nov 10 23:07:35 2016 - [info] node2(192.168.2.202:3306) Version=5.5.50-MariaDB (oldest major version between slaves) log-bin:enabled
Thu Nov 10 23:07:35 2016 - [info] Replicating from 192.168.2.201(192.168.2.201:3306)
Thu Nov 10 23:07:35 2016 - [info] Primary candidate for the new Master (candidate_master is set)
Thu Nov 10 23:07:35 2016 - [info] node3(192.168.2.203:3306) Version=5.5.50-MariaDB (oldest major version between slaves) log-bin:enabled
Thu Nov 10 23:07:35 2016 - [info] Replicating from 192.168.2.201(192.168.2.201:3306)
Thu Nov 10 23:07:35 2016 - [info] Current Alive Master: node1(192.168.2.201:3306)
Thu Nov 10 23:07:35 2016 - [info] Checking slave configurations..
Thu Nov 10 23:07:35 2016 - [warning] relay_log_purge=0 is not set on slave node3(192.168.2.203:3306).
Thu Nov 10 23:07:35 2016 - [info] Checking replication filtering settings..
Thu Nov 10 23:07:35 2016 - [info] binlog_do_db= , binlog_ignore_db=
Thu Nov 10 23:07:35 2016 - [info] Replication filtering check ok.
Thu Nov 10 23:07:35 2016 - [info] GTID (with auto-pos) is not supported
Thu Nov 10 23:07:35 2016 - [info] Starting SSH connection tests..
Thu Nov 10 23:07:37 2016 - [info] All SSH connection tests passed successfully.
Thu Nov 10 23:07:37 2016 - [info] Checking MHA Node version..
Thu Nov 10 23:07:37 2016 - [info] Version check ok.
Thu Nov 10 23:07:37 2016 - [info] Checking SSH publickey authentication settings on the current master..
Thu Nov 10 23:07:37 2016 - [info] HealthCheck: SSH to node1 is reachable.
Thu Nov 10 23:07:37 2016 - [info] Master MHA Node version is 0.56.
Thu Nov 10 23:07:37 2016 - [info] Checking recovery script configurations on node1(192.168.2.201:3306)..
Thu Nov 10 23:07:37 2016 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/data/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=Master-log.000006
Thu Nov 10 23:07:37 2016 - [info] Connecting to root@192.168.2.201(node1:22)..
Creating /data/masterha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /var/lib/mysql, up to Master-log.000006
Thu Nov 10 23:07:38 2016 - [info] Binlog setting check done.
Thu Nov 10 23:07:38 2016 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Thu Nov 10 23:07:38 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=node2 --slave_ip=192.168.2.202 --slave_port=3306 --workdir=/data/masterha/app1 --target_version=5.5.50-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Thu Nov 10 23:07:38 2016 - [info] Connecting to root@192.168.2.202(node2:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to relay-log.000004
Temporary relay log file is /var/lib/mysql/relay-log.000004
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Thu Nov 10 23:07:38 2016 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='root' --slave_host=node3 --slave_ip=192.168.2.203 --slave_port=3306 --workdir=/data/masterha/app1 --target_version=5.5.50-MariaDB --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx
Thu Nov 10 23:07:38 2016 - [info] Connecting to root@192.168.2.203(node3:22)..
Checking slave recovery environment settings..
Opening /var/lib/mysql/relay-log.info ... ok.
Relay log found at /var/lib/mysql, up to relay-log.000002
Temporary relay log file is /var/lib/mysql/relay-log.000002
Testing mysql connection and privileges.. done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Thu Nov 10 23:07:38 2016 - [info] Slaves settings check done.
Thu Nov 10 23:07:38 2016 - [info]
node1(192.168.2.201:3306) (current master)
+--node2(192.168.2.202:3306)
+--node3(192.168.2.203:3306)
Thu Nov 10 23:07:38 2016 - [info] Checking replication health on node2..
Thu Nov 10 23:07:38 2016 - [info] ok.
Thu Nov 10 23:07:38 2016 - [info] Checking replication health on node3..
Thu Nov 10 23:07:38 2016 - [info] ok.
Thu Nov 10 23:07:38 2016 - [warning] master_ip_failover_script is not defined.
Thu Nov 10 23:07:38 2016 - [warning] shutdown_script is not defined.
Thu Nov 10 23:07:38 2016 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
(3)最激动人心的时刻到了,启动服务!
[root@node4 mha4mysql-manager-0.56]# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/masterha/app1/manager.log 2>&1 &
[1] 8463
(4)查看 masterha 是否正在正常运行,还有主节点信息。
[root@node4 mha4mysql-manager-0.56]# masterha_check_status --conf=/etc/masterha/app1.cnf
app1 (pid:8463) is running(0:PING_OK), master:node1
模拟 MHA 故障
(1)Master 节点·node1·关闭 MariaDB
systemctl stop mariadb.service
(2)查看 manager 节点的状况
[root@node4 mha4mysql-manager-0.56]# masterha_check_status --conf=/etc/masterha/app1.cnfapp1 is stopped(2:NOT_RUNNING).
[1]+ Done nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/masterha/app1/manager.log 2>&1
可以看出 MHA 程序 masterha_manager 已经退出了
同时还要注意一点,在工作路径 /data/masterha/app1/ 下会生成一个 app1.failover.complete 的文件。
如果需要启动的时候,最好删除这个文件,否则会启动失败。
(3)去 node3 查看 slave 信息,node3 指向新的 Master 节点。
MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 192.168.2.202
Master_User: repuser
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: bin_log.000002
Read_Master_Log_Pos: 245
Relay_Log_File: relay-log.000002
Relay_Log_Pos: 527
Relay_Master_Log_File: bin_log.000002
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 245
Relay_Log_Space: 815
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
Replicate_Ignore_Server_Ids:
Master_Server_Id: 2
(4)node2 原本作为从节点所设置的只读属性也自动取消了。
MariaDB [(none)]> show variables like '%read_only%';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| read_only | OFF |
+---------------+-------+
1 row in set (0.00 sec)
(5)灾后重建的步骤
我们知道,当时原有 master 故障的时候,masterha_manager 会通过二进制日志和中继日志的状况,选举出新的 master 节点,并由只读状态改为可读写的状态会退出。
所以接下来要怎么做呢?a. 删除工作路径下的 failover.complete 文件。
如 /data/masterha/app1/app1.failover.complete
b. 原有的 master,也就是 node1 节点。
需要清空数据库,再将 node2 全备一次,恢复到 node1 上面来
并配置 node1 为 Slave 节点,并指向新的节点 node2
c. 重新通过 masterha_check 等工具检测环境是否正常,并重新启动 MHA 的主程序 masterha_manager。
本文永久更新链接地址:http://www.linuxidc.com/Linux/2017-10/147555.htm