共计 15297 个字符,预计需要花费 39 分钟才能阅读完成。
pt-heartbeat 是用来监测主从延迟的情况的,众所周知,传统的通过 show slave status\G 命令中的 Seconds_Behind_Master 值来判断主从延迟并不靠谱。
pt-heartbeat 的思路比较巧妙,它在 master 中插入一条带有当前时间(MySQL 中的 now() 函数)的记录到心跳表中,然后,该记录会复制到 slave 中。slave 根据当前的系统时间戳(Perl 中的 time 函数)减去 heartbeat 表中的记录值来判断主从的延迟情况。具体,可参考下面 –skew 参数的说明。
常见用法:
Master 上
需用 –update 参数
# pt-heartbeat --update -h 192.168.244.10 -u monitor -p monitor123 -D test
其中,–update 会每秒更新一次 heartbeat 表的记录 - D 指的是 heartbeat 表所在的 database
- D 是 –database 的缩写,–database 后面即可以直接加参数,如 –database test,也可用等号加参数,如 –database=test,但是缩写只能通过 -D test 加参数。
# pt-heartbeat --update -h 192.168.244.10 -u monitor -p monitor123 --database test
# pt-heartbeat --update -h 192.168.244.10 -u monitor -p monitor123 --database=test
注意:在第一次运行时,需带上 –create-table 参数创建 heartbeat 表并插入第一条记录。也可加上 –daemonize 参数,让该脚本以后台进程运行。
Slave 上
可用 –monitor 参数或者 –check 参数
–monitor 参数是持续监测并输出结果
# pt-heartbeat -D test –monitor -h 192.168.244.20 –master-server-id=1 -u monitor -p monitor123
10061.00s [167.68s, 33.54s, 11.18s ]
10062.00s [335.38s, 67.08s, 22.36s ]
10063.01s [503.10s, 100.62s, 33.54s ]
...
–check 参数是只监测一次就退出了
# pt-heartbeat -D test –check -h 192.168.244.20 –master-server-id=1 -u monitor -p monitor123
10039.00
注意:–update, –monitor 和 –check 三者是互斥的,–daemonize 只适用于 –update 场景
看看各参数的意义
–ask-pass
连接数据库时提示密码
Prompt for a password when connecting to MySQL.
–charset
short form: -A
默认字符集(个人感觉这个选项很鸡肋)
short form: -A; type: string
Default character set. If the value is utf8, sets Perl’s binmode on STDOUT to utf8, passes the mysql_enable_utf8 option to DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any other value sets binmode
on STDOUT without the utf8 layer, and runs SET NAMES after connecting to MySQL.
–check
检查从的延迟后退出,如果在级联复制中,还可以指定 –recurse 参数,这时候,会检测从库的从库的延迟情况。
Check slave delay once and exit. If you also specify --recurse, the tool will try to discover slave’s of the given slave and check and print their lag, too. The hostname or IP and port for each slave is printed before its
delay. --recurse only works with MySQL.
–check-read-only
检查 server 是否是只读的,如果是只读,则会跳过插入动作。
Check if the server has read_only enabled; If it does, the tool skips doing any inserts.
–config
Read this comma-separated list of config files; if specified, this must be the first option on the command line.
将参数写入到参数文件中,
有以下几点需要注意:
1> # pt-heartbeat –config pt-heartbeat.conf,而不能是# pt-heartbeat –config=pt-heartbeat.conf
2> 参数文件中只支持如下写法
option
option=value
option 面前不能带上 –,而且 option 只能是全拼,不能是缩写,譬如 database,不能缩写为 -D
具体写法可参考:https://www.percona.com/doc/percona-toolkit/2.1/configuration_files.html
试举一例,如下所示
# cat pt-heartbeat.conf
host=192.168.244.20
user=monitor
password=monitor123
monitor
database=test
master-server-id=1
–create-table
创建 heartbeat 表如果该表不存在,该表由 –database 和 –table 参数来确认。
其中 –table 表的定义如下所示:
CREATE TABLE heartbeat (ts varchar(26) NOT NULL,
server_id int unsigned NOT NULL PRIMARY KEY,
file varchar(255) DEFAULT NULL, -- SHOW MASTER STATUS
position bigint unsigned DEFAULT NULL, -- SHOW MASTER STATUS
relay_master_log_file varchar(255) DEFAULT NULL, -- SHOW SLAVE STATUS
exec_master_log_pos bigint unsigned DEFAULT NULL -- SHOW SLAVE STATUS
);
Create the heartbeat --table if it does not exist.
This option causes the table specified by --database and --table to be created with the following MAGIC_create_heartbeat table definition:
CREATE TABLE heartbeat (ts varchar(26) NOT NULL,
server_id int unsigned NOT NULL PRIMARY KEY,
file varchar(255) DEFAULT NULL, -- SHOW MASTER STATUS
position bigint unsigned DEFAULT NULL, -- SHOW MASTER STATUS
relay_master_log_file varchar(255) DEFAULT NULL, -- SHOW SLAVE STATUS
exec_master_log_pos bigint unsigned DEFAULT NULL -- SHOW SLAVE STATUS
);
The heartbeat table requires at least one row. If you manually create the heartbeat table, then you must insert a row by doing:
INSERT INTO heartbeat (ts, server_id) VALUES (NOW(), N);
or if using --utc:
INSERT INTO heartbeat (ts, server_id) VALUES (UTC_TIMESTAMP(), N);
where N is the server’s ID; do not use @@server_id because it will replicate and slaves will insert their own server ID instead of the master’s server ID.
This is done automatically by --create-table.
A legacy version of the heartbeat table is still supported:
CREATE TABLE heartbeat (id int NOT NULL PRIMARY KEY,
ts datetime NOT NULL
);
Legacy tables do not support --update instances on each slave of a multi-slave hierarchy like“master ->slave1 -> slave2”. To manually insert the one required row into a legacy table:
INSERT INTO heartbeat (id, ts) VALUES (1, NOW());
or if using --utc:
INSERT INTO heartbeat (id, ts) VALUES (1, UTC_TIMESTAMP());
The tool automatically detects if the heartbeat table is legacy.
–create-table-engine
指定 heartbeat 表的存储引擎
type: string
Sets the engine to be used for the heartbeat table. The default storage engine is InnoDB as of MySQL 5.5.5.
–daemonize
脚本以守护进程运行,这样即使脚本执行的终端断开了,脚本也不会停止运行。
Fork to the background and detach from the shell. POSIX operating systems only.
–database
指定 heartbeat 表所在的数据库
short form: -D; type: string
The database to use for the connection.
–dbi-driver
pt-heartbeat 不仅能检测 MySQL 之间的心跳延迟情况,还可以检测 PG。
该参数指定连接使用的驱动,默认为 mysql,也可指定为 Pg
default: mysql; type: string
Specify a driver for the connection; mysql and Pg are supported.
–defaults-file
指定参数文件的位置,必须为绝对路径。
short form: -F; type: string
Only read mysql options from the given file. You must give an absolute pathname.
–file
将最新的 –monitor 信息输出到文件中, 注意最新,新的信息会覆盖旧的信息。
如果不加该参数,则 monitor 的信息会直接输出到终端上,该选项通常和 –daemonize 参数一起使用。
譬如,
# pt-heartbeat -D test –monitor -h 192.168.244.20 –master-server-id=1 -u monitor -p monitor123 –file=result
该命令会在当前目录下生成一个 result 文件,记录最新的检测信息
# cat result
1376.00s [1126.25s, 225.25s, 75.08s ]
type: string
Print latest --monitor output to this file.
When --monitor is given, prints output to the specified file instead of to STDOUT. The file is opened, truncated,and closed every interval, so it will only contain the most recent statistics. Useful when --daemonize
is given.
–frames
统计的时间窗口,默认为 1m,5m,15m,即分别统计 1min,5min 和 15min 内的平均延迟情况。
单位可以是 s,m,h,d,注意:时间窗口越大,需要缓存的结果越多,对内存的消耗也越大。
type: string; default: 1m,5m,15m
Timeframes for averages.
Specifies the timeframes over which to calculate moving averages when --monitor is given. Specify as a comma-separated list of numbers with suffixes. The suffix can be s for seconds, m for minutes, h for hours, or d
for days. The size of the largest frame determines the maximum memory usage, as up to the specified number of per-second samples are kept in memory to calculate the averages. You can specify as many timeframes as
you like.
–help
Show help and exit.
–host
指定连接的主机,可缩写为 -h
short form: -h; type: string
Connect to host.
–[no]insert-heartbeat-row
官方解释如下:
default: yes
Insert a heartbeat row in the --table if one doesn’t exist.
The heartbeat --table requires a heartbeat row, else there’s nothing to --update, --monitor, or --check! By default, the tool will insert a heartbeat row if one is not already present. You can disable this
feature by specifying --no-insert-heartbeat-row in case the database user does not have INSERT privileges.
事实上,在执行如下命令时,
# pt-heartbeat -D test –update -h 192.168.244.10 -u monitor -p monitor123
如果,heartbeat 表为空,则会自动 insert 一条记录。
但如果指定了 –no-insert-heartbeat-row 参数,则不会自动创建,此时,会提示如下信息:
# pt-heartbeat -D test --update -h 192.168.244.10 -u monitor -p monitor123 --no-insert-heartbeat-row
No row found in heartbeat table for server_id 1.
At least one row must be inserted into the heartbeat table for server_id 1.
Please read the DESCRIPTION section of the pt-heartbeat POD.
PS:在测试的过程中,发现官方并没有完整的校验这个参数,即便传入 –no-insert-heartbeat 和 –insert-heartbeat 参数也不会报错,但是传入 –123-insert-heartbeat-ro,会报错“Unknown option: 123-insert-heartbeat-ro”。
default: yes
Insert a heartbeat row in the --table if one doesn’t exist.
The heartbeat --table requires a heartbeat row, else there’s nothing to --update, --monitor, or --check! By default, the tool will insert a heartbeat row if one is not already present. You can disable this
feature by specifying --no-insert-heartbeat-row in case the database user does not have INSERT privileges.
–interval
update 和 check heartbeat 表的频率,默认是 1s。
type: float; default: 1.0
How often to update or check the heartbeat --table. Updates and checks begin on the first whole second then repeat every --interval seconds for --update and every --interval plus --skew seconds for
--monitor.
For example, if at 00:00.4 an --update instance is started at 0.5 second intervals, the first update happens at 00:01.0, the next at 00:01.5, etc. If at 00:10.7 a --monitor instance is started at 0.05 second intervals with
the default 0.5 second --skew, then the first check happens at 00:11.5 (00:11.0 + 0.5) which will be --skew seconds after the last update which, because the instances are checking at synchronized intervals, happened at
00:11.0.
The tool waits for and begins on the first whole second just to make the interval calculations simpler. Therefore,the tool could wait up to 1 second before updating or checking.
The minimum (fastest) interval is 0.01, and the maximum precision is two decimal places, so 0.015 will be rounded to 0.02.
If a legacy heartbeat table (see --create-table) is used, then the maximum precision is 1s because the ts column is type datetime.
–log
在脚本以守护进程执行时,将结果输出到 log 指定的文件中。
type: string
Print all output to this file when daemonized.
–master-server-id
指定 master 的 server_id,在检测从的延迟时,必须指定该参数,不然会报如下错误:
The --master-server-id option must be specified because the heartbeat table `test`.`heartbeat` uses the server_id column for --update or --check but the server's master could not be automatically determined.
type: string
Calculate delay from this master server ID for --monitor or --check. If not given, pt-heartbeat attempts to connect to the server’s master and determine its server id.
–monitor
持续的检测并输出从的延迟情况
其中,检测并输出的频率有 –interval 参数决定,默认为 1s
注意:与 –check 的区别在于:
1> –monitor 是持续输出的,而 –check 是检测一次即退出。
2> –monitor 可与 –file 参数搭配,而 –check 与 –file 参数搭配无效。
Monitor slave delay continuously.
Specifies that pt-heartbeat should check the slave’s delay every second and report to STDOUT (or if --file is given, to the file instead). The output is the current delay followed by moving averages over the timeframe
given in --frames. For example,
5s [0.25s, 0.05s, 0.02s ]
–password
指定登录的密码,缩写为 -p
short form: -p; type: string
Password to use when connecting. If password contains commas they must be escaped with a backslash:“exam,ple”
–pid
创建 pid 文件
type: string
Create the given PID file. The tool won’t start if the PID file already exists and the PID it contains is different than the current PID. However, if the PID file exists and the PID it contains is no longer running, the tool will
overwrite the PID file with the current PID. The PID file is removed automatically when the tool exits.
–port
指定登录的端口,缩写为 -P
short form: -P; type: int
Port number to use for connection.
–print-master-server-id
同时输出主的 server_id,在 –monitor 情况下,默认输出为
1272.00s [21.20s, 4.24s, 1.41s ]
如果指定了该参数,则输出为
1272.00s [21.20s, 4.24s, 1.41s ]
1
Print the auto-detected or given --master-server-id. If --check or --monitor is specified, specifying this option will print the auto-detected or given --master-server-id at the end of each line.
–recurse
在 –check 模式下,用于检测级联复制中从的延迟情况。其中,–recurse 用于指定级联的层级。
type: int
Check slaves recursively to this depth in --check mode.
Try to discover slave servers recursively, to the specified depth. After discovering servers, run the check on each one of them and print the hostname (if possible), followed by the slave delay.
This currently works only with MySQL. See --recursion-method.
–recursion-method
在级联复制中,找到 slave 的方法。有 show processlist 和 show slave hosts 两种。
type: array; default: processlist,hosts
Preferred recursion method used to find slaves.
Possible methods are:
METHOD USES
=========== ==================
processlist SHOW PROCESSLIST
hosts SHOW SLAVE HOSTS
none Do not find slaves
The processlist method is preferred because SHOW SLAVE HOSTS is not reliable. However, the hosts method is required if the server uses a non-standard port (not 3306). Usually pt-heartbeat does the right thing and finds
the slaves, but you may give a preferred method and it will be used first. If it doesn’t find any slaves, the other methods will be tried.
在 –update 模式下,默认是使用 update 操作进行记录的更新,但有时候你不太确认 heartbeat 表中是否任何记录时,此时可使用 replace 操作。
注意:如果是通过 update 进行记录的更新,如果在脚本运行的过程中,truncate heartbeat 表,脚本并不会异常退出,但是 heartbeat 表也有不会生成新的记录。
但如果是通过 replace 方式进行记录的更新,则即便是在上面这种场景下,heartbeat 表仍旧会生成新的记录。个人感觉通过 replace 操作进行记录的更新更靠谱。
Use REPLACE instead of UPDATE for –update.
When running in --update mode, use REPLACE instead of UPDATE to set the heartbeat table’s timestamp.The REPLACE statement is a MySQL extension to SQL. This option is useful when you don’t know whether
the table contains any rows or not. It must be used in conjunction with –update.
–run-time
指定脚本运行的时间,无论是针对 –update 操作还是 –monitor 操作均实用。
type: time
Time to run before exiting.
–sentinel
“哨兵”,如果指定的文件存在则提出,默认为 /tmp/pt-heartbeat-sentinel
type: string; default: /tmp/pt-heartbeat-sentinel
Exit if this file exists.
经测试,即便没有带上 –sentinel 参数,如果 /tmp/pt-heartbeat-sentinel 文件存在,则脚本一执行时就直接退出。
–sentinel 作用在于自定义监控文件。
譬如在执行如下命令时,/root/123 文件并不存在,则该脚本会继续运行,在脚本运行的过程中,创建该文件,则脚本会马上退出。
# pt-heartbeat -D test –update -h 192.168.244.10 -u monitor -p monitor123 –sentinel=/root/123
–slave-user
设置连接 slave 的用户
type: string
Sets the user to be used to connect to the slaves. This parameter allows you to have a different user with less privileges on the slaves but that user must exist on all slaves.
–slave-password
设置连接 slave 的用户密码
type: string
Sets the password to be used to connect to the slaves. It can be used with –slave-user and the password for the user must be the same on all slaves.
–set-vars
设置脚本在与 MySQL 交互过程时的会话变量,但似乎并没有什么用
type: Array
Set the MySQL variables in this comma-separated list of variable=value pairs.
By default, the tool sets:
wait_timeout=10000
Variables specified on the command line override these defaults. For example, specifying --set-vars wait_timeout=500 overrides the defaultvalue of 10000.
The tool prints a warning and continues if a variable cannot be set.
–skew
指定 check 相对于 update 的延迟时间。默认为 0.5 秒
即 –update 更新一次后,–check 会在 0.5 秒后检查此次更新所对应的主从延迟情况。
可能有人会比较好奇,脚本是如何知道记录是何时更新的,实际上,每次 –update 的时间都是秒的整点值,譬如,其中一次记录的值为“2016-09-25T13:04:06.003130”。然后,0.5s 后,脚本获取 slave 上的系统时间,然后减去 heartbeat 中记录值,来作为主从延迟的时间。这就要求,主从上的系统时间需要保持一致,不然得到的结果就没有参考价值。
下面,可看看源码实现,这个是整个脚本的核心逻辑。
my ($ts, $hostname, $server_id) = $sth->fetchrow_array();
my $now = time;
PTDEBUG && _d("Heartbeat from server", $server_id, "\n",
" now:", ts($now, $utc), "\n",
" ts:", $ts, "\n",
"skew:", $skew);
my $delay = $now - unix_timestamp($ts, $utc) - $skew;
PTDEBUG && _d('Delay', sprintf('%.6f', $delay), 'on', $hostname);
# Because we adjust for skew, if the ts are less than skew seconds
# apart (i.e. replication is very fast) then delay will be negative.
# So it's effectively 0 seconds of lag.
$delay = 0.00 if $delay < 0;
type: float; default: 0.5
How long to delay checks.
The default is to delay checks one half second. Since the update happens as soon as possible after the beginning of the second on the master, this allows one half second of replication delay before reporting that the slave lags
the master by one second. If your clocks are not completely accurate or there is some other reason you’d like to delay the slave more or less, you can tweak this value. Try setting the PTDEBUG environment variable to see
the effect this has.
short form: -S; type: string
Socket file to use for connection.
–table
指定心跳表的名字,默认为 heartbeat
type: string; default: heartbeat
The table to use for the heartbeat.
Don’t specify database.table; use --database to specify the database.
See --create-table.
–update
更新 master 中 heartbeat 表的记录
Update a master’s heartbeat.
–user
指定连接的用户
short form: -u; type: string
User for login if not current user.
–utc
忽略系统时区,而使用 UTC。如果要使用该选项,则 –update,–monitor,–check 中必须同时使用。
Ignore system time zones and use only UTC. By default pt-heartbeat does not check or adjust for different system or MySQL time zones which can cause the tool to compute the lag incorrectly. Specifying this option is
a good idea because it ensures that the tool works correctly regardless of time zones.
If used, this option must be used for all pt-heartbeat instances: --update, --monitor, --check, etc.
You should probably set the option in a --config file. Mixing this option with pt-heartbeat instances not using this option will cause false-positive lag readings due to different time zones (unless all your systems are
set to use UTC, in which case this option isn’t required).
–version
打印版本信息
–[no]version-check
检查 pt,连接的 MySQL Server,Perl 以及 DBD::mysql 的版本信息。
并且打印这些软件特定版本的问题
Check for the latest version of Percona Toolkit, MySQL, and other programs.
84 Chapter 2. Tools
Percona Toolkit Documentation, Release 2.2.19
This is a standard“check for updates automatically”feature, with two additional features. First, the tool checks the version of other programs on the local system in addition to its own version. For example, it checks the
version of every MySQL server it connects to, Perl, and the Perl module DBD::mysql. Second, it checks for and warns about versions with known problems. For example, MySQL 5.5.25 had a critical bug and was re-released
as 5.5.25a.
Any updates or known problems are printed to STDOUT before the tool’s normal output. This feature should never interfere with the normal operation of the tool.
本文永久更新链接地址 :http://www.linuxidc.com/Linux/2016-09/135581.htm