共计 23545 个字符,预计需要花费 59 分钟才能阅读完成。
1 在 nginx 服务器上安装 nrpe 客户端:
Nginx 的服务需要监控起来,不然万一 down 了而不及时修复,会影响 web 应用,如下 web 应用上面启动的 nginx 后台进程
[root@lb-net-2 ~]# ps aux|grep nginx
nobody 15294 0.0 0.0 22432 3464 ? S Jul03 0:05 nginx: worker process
nobody 15295 0.0 0.0 22432 3480 ? S Jul03 0:05 nginx: worker process
……
nobody 15316 0.0 0.0 22432 3468 ? S Jul03 0:05 nginx: worker process
nobody 15317 0.0 0.0 22432 3480 ? S Jul03 0:05 nginx: worker process
root 16260 0.0 0.0 20584 1684 ? Ss Jun18 0:00 nginx: master process /usr/local/nginx/sbin/nginx
root 21211 0.0 0.0 103252 860 pts/1 S+ 17:50 0:00 grep nginx
网络监控器 Nagios 全攻略 http://www.linuxidc.com/Linux/2013-07/87067.htm
Nagios 搭建与配置详解 http://www.linuxidc.com/Linux/2013-05/84848.htm
Nginx 环境下构建 Nagios 监控平台 http://www.linuxidc.com/Linux/2011-07/38112.htm
在 RHEL5.3 上配置基本的 Nagios 系统(使用 Nagios-3.1.2) http://www.linuxidc.com/Linux/2011-07/38129.htm
CentOS 5.5+Nginx+Nagios 监控端和被控端安装配置指南 http://www.linuxidc.com/Linux/2011-09/44018.htm
Ubuntu 13.10 Server 安装 Nagios Core 网络监控运用 http://www.linuxidc.com/Linux/2013-11/93047.htm
1.1,rpm 方式安装 nrpe 客户端
下载地址:http://download.csdn.net/detail/mchdba/7493875
[root@localhost nagios]# ll
总计 768
-rw-r–r– 1 root root 713389 12-16 12:08 nagios-plugins-1.4.11-1.x86_64.rpm
-rw-r–r– 1 root root 32706 12-16 12:09 nrpe-2.12-1.x86_64.rpm
-rw-r–r– 1 root root 18997 12-16 12:08 nrpe-plugin-2.12-1.x86_64.rpm
[root@localhost nagios]# rpm -ivh *.rpm –nodeps –force
1.2 在配置文件最末尾,添加配置信息以及监控主机服务器 ip 地址
[root@ localhost nagios]# vim /etc/nagios/nrpe.cfg
# add by tim on 2014-06-11
command[check_users]=/usr/local/nagios/libexec/check_users -w 8 -c 15
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
#command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 50 -c 80
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 750 -c 800
command[check-host-alive]=/usr/local/nagios/libexec/check_ping -H 10.xx.3.29 -w 3000.0,80% -c 5000.0,100% -p 5
allowed_hosts = 127.0.0.1,10.xx.3.41
check 下命令是否生效:
[root@web-9 nrpe-2.15]# /usr/local/nagios/libexec/check_users -w 8 -c 15
USERS OK – 2 users currently logged in |users=2;8;15;0
[root@web-9 nrpe-2.15]#
看到已经 USERS OK -…. 命令已经生效。
1.3 启动 nrpe 报错如下:
[root@web-9 ~]# service nrpe restart
Shutting down nrpe: [失败]
Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory
[失败]
[root@web-9 ~]#
[root@db-m2-slave-1 nagios_client]# service nrpe start
Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory
[失败]
[root@db-m2-slave-1 nagios_client]#
建立连接
[root@db-m2-slave-1 nagios_client]# ln -s /usr/lib64/libssl.so /usr/lib64/libssl.so.6
(如果没有 libssl.so,就采用别的 libssl.so.10 来做软连接,ln -s /usr/lib64/libssl.so.10 /usr/lib64/libssl.so.6)
[root@db-m2-slave-1 nagios_client]#
再重新启动如下:
[root@db-m2-slave-1 nagios_client]# service nrpe start
Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libcrypto.so.6: cannot open shared object file: No such file or directory
[失败]
[root@web-10 ~]# ll /usr/lib64/libcrypto.so
lrwxrwxrwx. 1 root root 18 10 月 13 2013 /usr/lib64/libcrypto.so -> libcrypto.so.1.0.0
[root@db-m2-slave-1 nagios_client]#
再建链接:
[root@db-m2-slave-1 nagios_client]# ln -s /usr/lib64/libcrypto.so /usr/lib64/libcrypto.so.6
(或者如果没有 libcrypto.so,就采用 libcrypto.so.10 做软连接,ln -s /usr/lib64/libcrypto.so.10 /usr/lib64/libcrypto.so.6)
[root@db-m2-slave-1 nagios_client]# service nrpe start
Starting nrpe: [确定]
[root@db-m2-slave-1 nagios_client]#
1.4 检测下 nrpe 是否正常运行:
去 nagios 服务器端 check 下
[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.xx.3.xx
NRPE v2.12
[root@cache-2 ~]#
看到返回 NRPE v2.15 表示已经连接成功,客户端的 nrpe 服务已经监控完成。
2,比较简单的通过 check_http 的方式监控
可以在 /etc/nagios/nrpe.cfg 里面采用 check_http 的方式来获取 nginx 是否运行:
(1) 编辑 nrpe.cfg
Vim /etc/nagios/nrpe.cfg
command[check_nginx_status]=/usr/lib/nagios/plugins/check_http -I localhost -p 80 -u /nginx_status -e 200 -w 3 -c 10
(2) 重启 nrpe 服务
[root@lb-net-2 ~]# service nrpe restart
Shutting down nrpe: [确定]
Starting nrpe: [确定]
[root@lb-net-2 ~]#
(3) 在 nagios 服务器端 check,成功。
[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H10.xx.1.22 -c check_nginx_status
HTTP OK HTTP/1.1 200 OK – 254 bytes in 0.002 seconds |time=0.002031s;3.000000;10.000000;0.000000 size=254B;;;0
(4) 在 services.cfg 里面添加 check_nginx_status 服务
define service{
host_name lb-net-2
service_description check_nginx_status
check_command check_nrpe!check_nginx_status
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24×7
notification_interval 10
notification_period 24×7
notification_options w,u,c,r
contact_groups opsweb
}
(5) 在 command.cfg 添加 check_nginx_status 服务
define command{
command_name check_nginx_status
command_line $USER1$/check_nginx_status -I $HOSTADDRESS$ -w $Warning$ -c $Cri$
}
(6) 重新加载 nagios
[root@cache-2 objects]# service nagios reload
Running configuration check…
Reloading nagios configuration…
done
[root@cache-2 objects]#
(7) 查看界面的 nginx 监控服务,如下所示:
更多详情见请继续阅读下一页的精彩内容:http://www.linuxidc.com/Linux/2014-07/104072p2.htm
3 编写脚本来监控 nginx 服务
3.1 调试详细经过
[root@lb-net-2 run]# find / -name nginx.pid
/usr/local/nginx/logs/nginx.pid
[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -n /usr/local/nginx/logs/nginx.pid -s nginx_status -o /tmp/ -w 1500 -c 2000
expr: 参数数目错误
expr: 语法错误
(standard_in) 1: syntax error
/usr/lib/nagios/plugins/check_nginxstatus: line 258: [: : integer expression expected
/usr/lib/nagios/plugins/check_nginxstatus: line 262: [: : integer expression expected
OK – nginx is running. requests per second, connections per second (requests per connection) | ‘reqpsec’= ‘conpsec’= ‘conpreq’= ]
去查看 262 行,将逻辑运算符 “-a” 改成 “&&”
[root@lb-net-2 run]# vim /usr/lib/nagios/plugins/check_nginxstatus
[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -n /usr/local/nginx/logs/nginx.pid -s nginx_status -o /tmp/ -w 1500 -c 2000
expr: 参数数目错误
expr: 语法错误
(standard_in) 1: syntax error
/usr/lib/nagios/plugins/check_nginxstatus: line 258: [: missing `]’
/usr/lib/nagios/plugins/check_nginxstatus: line 262: [: : integer expression expected
OK – nginx is running. requests per second, connections per second (requests per connection) | ‘reqpsec’= ‘conpsec’= ‘conpreq’= ]
[root@lb-net-2 run]#
看到已经 OK 了,再修改文件。
[root@lb-net-2 run]# vim /usr/lib/nagios/plugins/check_nginxstatus
[root@lb-net-2 run]#
[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -n /usr/local/nginx/logs/nginx.pid -s nginx_status -o /tmp/ -w 1500 -c 2000
expr: 参数数目错误
expr: 语法错误
(standard_in) 1: syntax error
/usr/lib/nagios/plugins/check_nginxstatus: line 258: [: missing `]’
OK – nginx is running. requests per second, connections per second (requests per connection) | ‘reqpsec’= ‘conpsec’= ‘conpreq’= ]
[root@lb-net-2 run]#
将 [] 改成使用 ”[[]]”,即可!
[root@lb-net-2 run]# vim /usr/lib/nagios/plugins/check_nginxstatus
[root@lb-net-2 run]#
[root@lb-net-2 run]#
[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -n /usr/local/nginx/logs/nginx.pid -s nginx_status -o /tmp/ -w 1500 -c 2000
expr: 参数数目错误
expr: 语法错误
(standard_in) 1: syntax error
OK – nginx is running. requests per second, connections per second (requests per connection) | ‘reqpsec’= ‘conpsec’= ‘conpreq’= ]
[root@lb-net-2 run]#
注释掉 #reqpcon=`echo “scale=2; $reqpsec / $conpsec” | bc -l` 之后,就不会报(standard_in) 1: syntax error 错误,如下所示:
[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -s nginx_status -n nginx.pid -w 15000 -c 20000
expr: 参数数目错误
expr: 语法错误
OK – nginx is running. requests per second, connections per second (requests per connection) | ‘reqpsec’= ‘conpsec’= ‘conpreq’= ]
[root@lb-net-2 run]#
注释掉# reqpsec=`expr $tmp2_reqpsec – $tmp1_reqpsec` 就不会再报 expr: 参数数目错误,如下所示:
报错:
[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -s nginx_status -n nginx.pid -w 15000 -c 20000
expr: 语法错误
OK – nginx is running. requests per second, connections per second (requests per connection) | ‘reqpsec’= ‘conpsec’= ‘conpreq’= ]
再次注释掉 #reqpcon=`echo “scale=2; $reqpsec / $conpsec” | bc -l` 后,运行不会报 expr: 语法错误,如下所示:
[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -s nginx_status -n nginx.pid -w 15000 -c 20000
OK – nginx is running. requests per second, connections per second (requests per connection) | ‘reqpsec’= ‘conpsec’= ‘conpreq’= ]
[root@lb-net-2 run]#
看到这里发现 ‘reqpsec’= ‘conpsec’= ‘conpreq’= 都没有值,但是 nginx 又是在启动运行着,问题出在哪里?经过排查,原来是 nginx_status 服务没有启动,需要在 /usr/local/nginx/conf/nginx.conf 配置文件里面添加如下配置:
# 添加 pid 参数
pid logs/nginx.pid;
#charset koi8-r;
access_log logs/host.access.log main;
location /nginx_status {
stub_status on;
access_log off;
deny all;
}
然后重新加载 nginx,看到新的nginx-status 文件是生成了,但是文件内容为空,如下所示:
[root@lb-net-2 logs]# ll /tmp/nginx*
-rw-r–r–. 1 root root 0 7月 3 15:06 /tmp/nginx-status.1
[root@lb-net-2 logs]#
去查看 ngins 后台日志
[root@lb-net-2 logs]# cd /usr/local/nginx/
[root@lb-net-2 logs]# tail -n 300 error.log
……
2014/07/03 15:05:47 [error] 4285#0: *1851293 access forbidden by rule, client: 127.0.0.1, server: localhost, request: “GET /nginx_status HTTP/1.0”, host: “localhost”
2014/07/03 15:05:48 [error] 4285#0: *1851294 access forbidden by rule, client: 127.0.0.1, server: localhost, request: “GET /nginx_status HTTP/1.0”, host: “localhost”
2014/07/03 15:06:12 [error] 4282#0: *1851362 access forbidden by rule, client: 127.0.0.1, server: localhost, request: “GET /nginx_status HTTP/1.0”, host: “localhost”
2014/07/03 15:06:13 [error] 4282#0: *1851363 access forbidden by rule, client: 127.0.0.1, server: localhost, request: “GET /nginx_status HTTP/1.0”, host: “localhost”
2014/07/03 15:06:55 [error] 4285#0: *1851509 access forbidden by rule, client: 127.0.0.1, server: localhost, request: “GET /nginx_status HTTP/1.0”, host: “localhost”
2014/07/03 15:06:56 [error] 4285#0: *1851519 access forbidden by rule, client: 127.0.0.1, server: localhost, request: “GET /nginx_status HTTP/1.0”, host: “localhost”
查看 nginx 编译参数
[root@lb-net-2 logs]# /usr/local/nginx/sbin/nginx -V
nginx version: nginx/1.4.2
built by gcc 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC)
configure arguments: –prefix=/usr/local/nginx –with-http_stub_status_module –with-http_realip_module
证明确实是加载了 stub_status 插件,之后去修改配置文件,注释掉 deny all; 重新加载 nginx。
[root@lb-net-2 logs]# vim /usr/local/nginx/conf/nginx.conf
#deny all;
[root@lb-net-2 logs]# service nginx reload
reload nginx
[root@lb-net-2 logs]#
[root@lb-net-2 logs]# ll /tmp/nginx*
ls: 无法访问 /tmp/nginx*: 没有那个文件或目录
[root@lb-net-2 logs]#
还是没有看到 /tmp/nginx-status.1 状态文件生成,因为 nagios 下监控 nginx 的脚本是从 nginx-status.1 获取数据,如果没有这个文件,没有办法获取数据。
继续 google,”nginx stub_status 没有生成 nginx-status.1” 文件,看到有人说只要配置好了这个状态文件有没有无所谓,我就试着直接运行脚本看看能否生效。
[root@lb-net-2 logs]# ll /tmp/nginx*
ls: 无法访问 /tmp/nginx*: 没有那个文件或目录
[root@lb-net-2 logs]# /root/check_nginx2.sh -H localhost -P 80 -p /usr/local/nginx/logs/ -n nginx.pid -s nginx_status -w 15000 -c 20000
OK – nginx is running. 1 requests per second, 2 connections per second (.50 requests per connection) | ‘reqpsec’=1 ‘conpsec’=2 ‘conpreq’=.50 ]
[root@lb-net-2 logs]#
看到 ‘reqpsec’=1 ‘conpsec’=2 ‘conpreq’=.50 里面有数据了,再去 check 下文件有没有生成,如下所示:
[root@lb-net-2 logs]# ll /tmp/nginx*
ls: 无法访问 /tmp/nginx*: 没有那个文件或目录
[root@lb-net-2 logs]#
还是没有文件生成,但是 check 已经有数据了,证明不一定要拘泥于是否在 /tmp/ 目录下是否有 nginx-status.1 文件。通过脚本分析如下:
[root@lb-net-2 logs]# vim /usr/lib/nagios/plugins/check_nginxstatus
180 get_status() {
181 if [“$secure” = 1]
182 then
183 wget_opts=”-O- -q -t 3 -T 3 –no-check-certificate”
184 out1=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`
185 sleep 1
186 out2=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`
187 else
188 wget_opts=”-O- -q -t 3 -T 3″
189 out1=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`
190 sleep 1
191 out2=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`
192 fi
193
194 if [-z “$out1” -o -z “$out2”]
195 then
196 echo “UNKNOWN – Local copy/copies of $status_page is empty.”
197 exit $ST_UK
198 fi
199 }
是通过访问 `wget -O- -q -t 3 -T 3 –no-check-certificate http://10.xx.xx.xx:80/nginx_status` 这个链接来获取 status 的数据记录的,而不是去加载 /tmp/nginx-status.1 文件来获取数据的。直接访问 http://10.xx.xx.xx:80/nginx_status 地址就能获取 nginx 运行数据,如下图所示:
在 nagios 服务器上 check 下,报错:
[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H10.xx.xx.xx -c check_nginx_status
UNKNOWN – Local copy/copies of nginx_status is empty.
[root@cache-2 ~]#
检查监控脚本,搜索‘Local copy/copies of nginx_status is empty.’在第 197 行,有如下代码:
195 if [-z “$out1” -o -z “$out2”]
196 then
197 echo “UNKNOWN – Local copy/copies of $status_page is empty.”
198 exit $ST_UK
199 fi
看出是由于 if [-z “$out1” -o -z “$out2”] 这个判断生效,导致监控脚本运行到这里就 exit 了。继续调试,发现用 nagios 服务器调用脚本的时候,执行到以下第 190 行到第 192 行
out1=`/usr/bin/wget ${wget_opts} http://${hostname}:${port}/${status_page}`
sleep 1
out2=`/usr/bin/wget ${wget_opts} http://${hostname}:${port}/${status_page}`
的时候,out1为空,out2也为空,所以在后面的 if [-z “$out1” -o -z “$out2”] 判断通过报出信息为:UNKNOWN – Local copy/copies of $status_page is empty. 然后直接exit。
说明:由于 nginx 是要调用 wget 命令来获取 nginx_status 状态的,而 wget 命令是只能以 root 用户来运行的 , 所以需要将 nagios 用户设置成可以无需密码直接 su 成root,这样就能以 nagios 用户运行命令 sudo /usr/lib/nagios/plugins/check_nginxstatus 。在CentOS 系统中,无法直接调用 sudo 命令,需要修改 /etc/sudoers, 找到 #Defaults requiretty 并取消注释,另外新增一行。表示 nagios 用户不需要登陆终端就可以调用命令,如下所示:
Defaults requiretty
Defaults:nagios !requiretty
#添加 nagios 请求sudo,允许特定指令时(可跟参数),不需要密码(如)。
nagios ALL=(ALL) NOPASSWD: ALL
修改完后,再check,数据出来了:
[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H10.xx.xx.xx -c check_nginx_status
OK – nginx is running. 1 requests per second, 1 connections per second (1.00 requests per connection) | ‘reqpsec’=1 ‘conpsec’=1 ‘conpreq’=1.00 ]
[root@cache-2 ~]#
CentOS 6.2 实战部署 Nginx+MySQL+PHP http://www.linuxidc.com/Linux/2013-09/90020.htm
使用 Nginx 搭建 WEB 服务器 http://www.linuxidc.com/Linux/2013-09/89768.htm
搭建基于 Linux6.3+Nginx1.2+PHP5+MySQL5.5 的 Web 服务器全过程 http://www.linuxidc.com/Linux/2013-09/89692.htm
CentOS 6.3 下 Nginx 性能调优 http://www.linuxidc.com/Linux/2013-09/89656.htm
CentOS 6.3 下配置 Nginx 加载 ngx_pagespeed 模块 http://www.linuxidc.com/Linux/2013-09/89657.htm
CentOS 6.4 安装配置 Nginx+Pcre+php-fpm http://www.linuxidc.com/Linux/2013-08/88984.htm
1 在 nginx 服务器上安装 nrpe 客户端:
Nginx 的服务需要监控起来,不然万一 down 了而不及时修复,会影响 web 应用,如下 web 应用上面启动的 nginx 后台进程
[root@lb-net-2 ~]# ps aux|grep nginx
nobody 15294 0.0 0.0 22432 3464 ? S Jul03 0:05 nginx: worker process
nobody 15295 0.0 0.0 22432 3480 ? S Jul03 0:05 nginx: worker process
……
nobody 15316 0.0 0.0 22432 3468 ? S Jul03 0:05 nginx: worker process
nobody 15317 0.0 0.0 22432 3480 ? S Jul03 0:05 nginx: worker process
root 16260 0.0 0.0 20584 1684 ? Ss Jun18 0:00 nginx: master process /usr/local/nginx/sbin/nginx
root 21211 0.0 0.0 103252 860 pts/1 S+ 17:50 0:00 grep nginx
网络监控器 Nagios 全攻略 http://www.linuxidc.com/Linux/2013-07/87067.htm
Nagios 搭建与配置详解 http://www.linuxidc.com/Linux/2013-05/84848.htm
Nginx 环境下构建 Nagios 监控平台 http://www.linuxidc.com/Linux/2011-07/38112.htm
在 RHEL5.3 上配置基本的 Nagios 系统(使用 Nagios-3.1.2) http://www.linuxidc.com/Linux/2011-07/38129.htm
CentOS 5.5+Nginx+Nagios 监控端和被控端安装配置指南 http://www.linuxidc.com/Linux/2011-09/44018.htm
Ubuntu 13.10 Server 安装 Nagios Core 网络监控运用 http://www.linuxidc.com/Linux/2013-11/93047.htm
1.1,rpm 方式安装 nrpe 客户端
下载地址:http://download.csdn.net/detail/mchdba/7493875
[root@localhost nagios]# ll
总计 768
-rw-r–r– 1 root root 713389 12-16 12:08 nagios-plugins-1.4.11-1.x86_64.rpm
-rw-r–r– 1 root root 32706 12-16 12:09 nrpe-2.12-1.x86_64.rpm
-rw-r–r– 1 root root 18997 12-16 12:08 nrpe-plugin-2.12-1.x86_64.rpm
[root@localhost nagios]# rpm -ivh *.rpm –nodeps –force
1.2 在配置文件最末尾,添加配置信息以及监控主机服务器 ip 地址
[root@ localhost nagios]# vim /etc/nagios/nrpe.cfg
# add by tim on 2014-06-11
command[check_users]=/usr/local/nagios/libexec/check_users -w 8 -c 15
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
#command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 50 -c 80
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 750 -c 800
command[check-host-alive]=/usr/local/nagios/libexec/check_ping -H 10.xx.3.29 -w 3000.0,80% -c 5000.0,100% -p 5
allowed_hosts = 127.0.0.1,10.xx.3.41
check 下命令是否生效:
[root@web-9 nrpe-2.15]# /usr/local/nagios/libexec/check_users -w 8 -c 15
USERS OK – 2 users currently logged in |users=2;8;15;0
[root@web-9 nrpe-2.15]#
看到已经 USERS OK -…. 命令已经生效。
1.3 启动 nrpe 报错如下:
[root@web-9 ~]# service nrpe restart
Shutting down nrpe: [失败]
Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory
[失败]
[root@web-9 ~]#
[root@db-m2-slave-1 nagios_client]# service nrpe start
Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory
[失败]
[root@db-m2-slave-1 nagios_client]#
建立连接
[root@db-m2-slave-1 nagios_client]# ln -s /usr/lib64/libssl.so /usr/lib64/libssl.so.6
(如果没有 libssl.so,就采用别的 libssl.so.10 来做软连接,ln -s /usr/lib64/libssl.so.10 /usr/lib64/libssl.so.6)
[root@db-m2-slave-1 nagios_client]#
再重新启动如下:
[root@db-m2-slave-1 nagios_client]# service nrpe start
Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libcrypto.so.6: cannot open shared object file: No such file or directory
[失败]
[root@web-10 ~]# ll /usr/lib64/libcrypto.so
lrwxrwxrwx. 1 root root 18 10 月 13 2013 /usr/lib64/libcrypto.so -> libcrypto.so.1.0.0
[root@db-m2-slave-1 nagios_client]#
再建链接:
[root@db-m2-slave-1 nagios_client]# ln -s /usr/lib64/libcrypto.so /usr/lib64/libcrypto.so.6
(或者如果没有 libcrypto.so,就采用 libcrypto.so.10 做软连接,ln -s /usr/lib64/libcrypto.so.10 /usr/lib64/libcrypto.so.6)
[root@db-m2-slave-1 nagios_client]# service nrpe start
Starting nrpe: [确定]
[root@db-m2-slave-1 nagios_client]#
1.4 检测下 nrpe 是否正常运行:
去 nagios 服务器端 check 下
[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.xx.3.xx
NRPE v2.12
[root@cache-2 ~]#
看到返回 NRPE v2.15 表示已经连接成功,客户端的 nrpe 服务已经监控完成。
2,比较简单的通过 check_http 的方式监控
可以在 /etc/nagios/nrpe.cfg 里面采用 check_http 的方式来获取 nginx 是否运行:
(1) 编辑 nrpe.cfg
Vim /etc/nagios/nrpe.cfg
command[check_nginx_status]=/usr/lib/nagios/plugins/check_http -I localhost -p 80 -u /nginx_status -e 200 -w 3 -c 10
(2) 重启 nrpe 服务
[root@lb-net-2 ~]# service nrpe restart
Shutting down nrpe: [确定]
Starting nrpe: [确定]
[root@lb-net-2 ~]#
(3) 在 nagios 服务器端 check,成功。
[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H10.xx.1.22 -c check_nginx_status
HTTP OK HTTP/1.1 200 OK – 254 bytes in 0.002 seconds |time=0.002031s;3.000000;10.000000;0.000000 size=254B;;;0
(4) 在 services.cfg 里面添加 check_nginx_status 服务
define service{
host_name lb-net-2
service_description check_nginx_status
check_command check_nrpe!check_nginx_status
max_check_attempts 5
normal_check_interval 3
retry_check_interval 2
check_period 24×7
notification_interval 10
notification_period 24×7
notification_options w,u,c,r
contact_groups opsweb
}
(5) 在 command.cfg 添加 check_nginx_status 服务
define command{
command_name check_nginx_status
command_line $USER1$/check_nginx_status -I $HOSTADDRESS$ -w $Warning$ -c $Cri$
}
(6) 重新加载 nagios
[root@cache-2 objects]# service nagios reload
Running configuration check…
Reloading nagios configuration…
done
[root@cache-2 objects]#
(7) 查看界面的 nginx 监控服务,如下所示:
更多详情见请继续阅读下一页的精彩内容:http://www.linuxidc.com/Linux/2014-07/104072p2.htm
3.2 share下 check_nginxstatus 脚本
- #!/bin/sh
- PROGNAME=`basename $0`
- VERSION=\\\“Version 1.1,\\\”
- AUTHOR=\\\“tim man\\\”
- ST_OK=0
- ST_WR=1
- ST_CR=2
- ST_UK=3
- hostname=\\\“localhost\\\”
- port=80
- path_pid=/var/run
- name_pid=\\\“nginx.pid\\\”
- status_page=\\\“nginx_status\\\”
- pid_check=1
- secure=0
- print_version() {
- echo \\\“$VERSION $AUTHOR\\\”
- }
- print_help() {
- print_version $PROGNAME $VERSION
- echo \\\“\\\”
- echo \\\“$PROGNAME is a Nagios plugin to check whether nginx is running.\\\”
- echo \\\“It also parses the nginx\\\’s status page to get requests and\\\”
- echo \\\“connections per second as well as requests per connection. You\\\”
- echo \\\“may have to alter your nginx configuration so that the plugin\\\”
- echo \\\“can access the server\\\’s status page.\\\”
- echo \\\“The plugin is highly configurable for this reason. See below for\\\”
- echo \\\“available options.\\\”
- echo \\\“\\\”
- echo \\\“$PROGNAME -H localhost -P 80 -p /var/run -n nginx.pid \\\”
- echo \\\” -s nginx_statut -o /tmp [-w INT] [-c INT] [-S] [-N]\\\”
- echo \\\“\\\”
- echo \\\“Options:\\\”
- echo \\\” -H/–hostname)\\\”
- echo \\\” Defines the hostname. Default is: localhost\\\”
- echo \\\” -P/–port)\\\”
- echo \\\” Defines the port. Default is: 80\\\”
- echo \\\” -p/–path-pid)\\\”
- echo \\\” Path where nginx\\\’s pid file is being stored. You might need\\\”
- echo \\\” to alter this path according to your distribution. Default\\\”
- echo \\\” is: /var/run\\\”
- echo \\\” -n/–name_pid)\\\”
- echo \\\” Name of the pid file. Default is: nginx.pid\\\”
- echo \\\” -N/–no-pid-check)\\\”
- echo \\\” Turn this on, if you don\\\’t want to check for a pid file\\\”
- echo \\\” whether nginx is running, e.g. when you\\\’re checking a\\\”
- echo \\\” remote server. Default is: off\\\”
- echo \\\” -s/–status-page)\\\”
- echo \\\” Name of the server\\\’s status page defined in the location\\\”
- echo \\\” directive of your nginx configuration. Default is:\\\”
- echo \\\” nginx_status\\\”
- echo \\\” -S/–secure)\\\”
- echo \\\” In case your server is only reachable via SSL, use this\\\”
- echo \\\” this switch to use HTTPS instead of HTTP. Default is: off\\\”
- echo \\\” -w/–warning)\\\”
- echo \\\” Sets a warning level for requests per second. Default is: off\\\”
- echo \\\” -c/–critical)\\\”
- echo \\\” Sets a critical level for requests per second. Default is:\\\”
- echo \\\” off\\\”
- exit $ST_UK
- }
- while test –n \\\“$1\\\”; do
- case \\\“$1\\\” in
- –help|–h)
- print_help
- exit $ST_UK
- ;;
- ––version|–v)
- print_version $PROGNAME $VERSION
- exit $ST_UK
- ;;
- ––hostname|–H)
- hostname=$2
- shift
- ;;
- ––port|–P)
- port=$2
- shift
- ;;
- ––path–pid|–p)
- path_pid=$2
- shift
- ;;
- ––name–pid|–n)
- name_pid=$2
- shift
- ;;
- ––no–pid–check|–N)
- pid_check=0
- ;;
- ––status–page|–s)
- status_page=$2
- shift
- ;;
- ––secure|–S)
- secure=1
- ;;
- ––warning|–w)
- warning=$2
- shift
- ;;
- ––critical|–c)
- critical=$2
- shift
- ;;
- *)
- echo \\\“Unknown argument: $1\\\”
- print_help
- exit $ST_UK
- ;;
- esac
- shift
- done
- get_wcdiff() {
- if [ ! –z \\\“$warning\\\” –a ! –z \\\“$critical\\\” ]
- then
- wclvls=1
- if [ ${warning} –ge ${critical} ]
- then
- wcdiff=1
- fi
- elif [ ! –z \\\“$warning\\\” –a –z \\\“$critical\\\” ]
- then
- wcdiff=2
- elif [ –z \\\“$warning\\\” –a ! –z \\\“$critical\\\” ]
- then
- wcdiff=3
- fi
- }
- val_wcdiff() {
- if [ \\\“$wcdiff\\\” = 1 ]
- then
- echo \\\“Please adjust your warning/critical thresholds. The warning \\\\
- must be lower than the critical level!\\\”
- exit $ST_UK
- elif [ \\\“$wcdiff\\\” = 2 ]
- then
- echo \\\“Please also set a critical value when you want to use \\\\
- warning/critical thresholds!\\\”
- exit $ST_UK
- elif [ \\\“$wcdiff\\\” = 3 ]
- then
- echo \\\“Please also set a warning value when you want to use \\\\
- warning/critical thresholds!\\\”
- exit $ST_UK
- fi
- }
- check_pid() {
- if [ –f \\\“$path_pid/$name_pid\\\” ]
- then
- retval=0
- else
- retval=1
- fi
- }
- get_status() {
- if [ \\\“$secure\\\” = 1 ]
- then
- wget_opts=\\\“-O- -q -t 3 -T 3 –no-check-certificate\\\”
- #out1=`/usr/bin/wget ${wget_opts} http://${hostname}:${port}/${status_page}`
- out1=`/usr/bin/wget –O– –q –t 3 –T 3 http://localhost:80/nginx_status`
- sleep 1
- out2=`/usr/bin/wget –O– –q –t 3 –T 3 http://localhost:80/nginx_status`
- else
- wget_opts=\\\“-O- -q -t 3 -T 3\\\”
- out1=`/usr/bin/wget –O– –q –t 3 –T 3 http://localhost:80/nginx_status`
- sleep 1
- out2=`/usr/bin/wget –O– –q –t 3 –T 3 http://localhost:80/nginx_status`
- fi
- if [ –z \\\“$out1\\\” –o –z \\\“$out2\\\” ]
- then
- echo \\\“out1:$out1 out2:$out2, UNKNOWN – Local copy/copies of $status_page is empty.\\\”
- exit $ST_UK
- fi
- }
- get_vals() {
- tmp1_reqpsec=`echo ${out1}|awk \\\‘{print $10}\\\’`
- tmp2_reqpsec=`echo ${out2}|awk \\\‘{print $10}\\\’`
- reqpsec=`expr $tmp2_reqpsec – $tmp1_reqpsec`
- tmp1_conpsec=`echo ${out1}|awk \\\‘{print $9}\\\’`
- tmp2_conpsec=`echo ${out2}|awk \\\‘{print $9}\\\’`
- conpsec=`expr $tmp2_conpsec – $tmp1_conpsec`
- reqpcon=`echo \\\“scale=2; $reqpsec / $conpsec\\\” | bc –l`
- if [ \\\“$reqpcon\\\” = \\\“.99\\\” ]
- then
- reqpcon=\\\“1.00\\\”
- fi
- }
- do_output() {
- output=\\\“nginx is running. $reqpsec requests per second, $conpsec connections per second ($reqpcon requests per connection)\\\”
- }
- do_perfdata() {
- perfdata=\\\“\\\’reqpsec\\\’=$reqpsec \\\’conpsec\\\’=$conpsec \\\’conpreq\\\’=$reqpcon\\\”
- }
- # Here we
- get_wcdiff
- val_wcdiff
- if [ ${pid_check} = 1 ]
- then
- check_pid
- if [ \\\“$retval\\\” = 1 ]
- then
- echo \\\“There\\\’s no pid file for nginx. Is nginx running? Please also make sure whether your pid path and name is correct.\\\”
- exit $ST_CR
- fi
- fi
- get_status
- get_vals
- do_output
- do_perfdata
- if [[ –n \\\“$warning\\\” ]] && [[ –n \\\“$critical\\\” ]]
- then
- if [[ \\\“$reqpsec\\\” –ge \\\“$warning\\\” ]] && [[ \\\“$reqpsec\\\” –lt \\\“$critical\\\” ]]
- then
- echo \\\“WARNING – ${output} | ${perfdata}\\\”
- exit $ST_WR
- elif [ \\\“$reqpsec\\\” –ge \\\“$critical\\\” ]
- then
- echo \\\“CRITICAL – ${output} | ${perfdata}\\\”
- exit $ST_CR
- else
- echo \\\“OK – ${output} | ${perfdata} ]\\\”
- exit $ST_OK
- fi
- else
- echo \\\“OK – ${output} | ${perfdata}\\\”
- exit $ST_OK
- fi