Nagios监控Nginx服务详细过程

207次阅读

共计 23545 个字符，预计需要花费 59 分钟才能阅读完成。

1 在 nginx 服务器上安装 nrpe 客户端：

Nginx 的服务需要监控起来，不然万一 down 了而不及时修复，会影响 web 应用，如下 web 应用上面启动的 nginx 后台进程
[root@lb-net-2 ~]# ps aux|grep nginx
nobody 15294 0.0 0.0 22432 3464 ? S Jul03 0:05 nginx: worker process
nobody 15295 0.0 0.0 22432 3480 ? S Jul03 0:05 nginx: worker process
……
nobody 15316 0.0 0.0 22432 3468 ? S Jul03 0:05 nginx: worker process
nobody 15317 0.0 0.0 22432 3480 ? S Jul03 0:05 nginx: worker process
root 16260 0.0 0.0 20584 1684 ? Ss Jun18 0:00 nginx: master process /usr/local/nginx/sbin/nginx
root 21211 0.0 0.0 103252 860 pts/1 S+ 17:50 0:00 grep nginx

网络监控器 Nagios 全攻略 http://www.linuxidc.com/Linux/2013-07/87067.htm

Nagios 搭建与配置详解 http://www.linuxidc.com/Linux/2013-05/84848.htm

Nginx 环境下构建 Nagios 监控平台 http://www.linuxidc.com/Linux/2011-07/38112.htm

在 RHEL5.3 上配置基本的 Nagios 系统(使用 Nagios-3.1.2) http://www.linuxidc.com/Linux/2011-07/38129.htm

CentOS 5.5+Nginx+Nagios 监控端和被控端安装配置指南 http://www.linuxidc.com/Linux/2011-09/44018.htm

Ubuntu 13.10 Server 安装 Nagios Core 网络监控运用 http://www.linuxidc.com/Linux/2013-11/93047.htm

1.1，rpm 方式安装 nrpe 客户端

下载地址：http://download.csdn.net/detail/mchdba/7493875

[root@localhost nagios]# ll

总计 768

-rw-r–r– 1 root root 713389 12-16 12:08 nagios-plugins-1.4.11-1.x86_64.rpm

-rw-r–r– 1 root root 32706 12-16 12:09 nrpe-2.12-1.x86_64.rpm

-rw-r–r– 1 root root 18997 12-16 12:08 nrpe-plugin-2.12-1.x86_64.rpm

[root@localhost nagios]# rpm -ivh *.rpm –nodeps –force

1.2 在配置文件最末尾，添加配置信息以及监控主机服务器 ip 地址

[root@ localhost nagios]# vim /etc/nagios/nrpe.cfg

# add by tim on 2014-06-11

command[check_users]=/usr/local/nagios/libexec/check_users -w 8 -c 15

command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20

command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda

command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z

#command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 50 -c 80

command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 750 -c 800

command[check-host-alive]=/usr/local/nagios/libexec/check_ping -H 10.xx.3.29 -w 3000.0,80% -c 5000.0,100% -p 5

allowed_hosts = 127.0.0.1,10.xx.3.41

check 下命令是否生效：

[root@web-9 nrpe-2.15]# /usr/local/nagios/libexec/check_users -w 8 -c 15

USERS OK – 2 users currently logged in |users=2;8;15;0

[root@web-9 nrpe-2.15]#

看到已经 USERS OK -…. 命令已经生效。

1.3 启动 nrpe 报错如下：

[root@web-9 ~]# service nrpe restart

Shutting down nrpe: [失败]

Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory

[失败]

[root@web-9 ~]#

[root@db-m2-slave-1 nagios_client]# service nrpe start

Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory

[失败]

[root@db-m2-slave-1 nagios_client]#

建立连接

[root@db-m2-slave-1 nagios_client]# ln -s /usr/lib64/libssl.so /usr/lib64/libssl.so.6

(如果没有 libssl.so，就采用别的 libssl.so.10 来做软连接，ln -s /usr/lib64/libssl.so.10 /usr/lib64/libssl.so.6)

[root@db-m2-slave-1 nagios_client]#

再重新启动如下：

[root@db-m2-slave-1 nagios_client]# service nrpe start

Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libcrypto.so.6: cannot open shared object file: No such file or directory

[失败]

[root@web-10 ~]# ll /usr/lib64/libcrypto.so

lrwxrwxrwx. 1 root root 18 10 月 13 2013 /usr/lib64/libcrypto.so -> libcrypto.so.1.0.0

[root@db-m2-slave-1 nagios_client]#

再建链接：
[root@db-m2-slave-1 nagios_client]# ln -s /usr/lib64/libcrypto.so /usr/lib64/libcrypto.so.6

(或者如果没有 libcrypto.so，就采用 libcrypto.so.10 做软连接，ln -s /usr/lib64/libcrypto.so.10 /usr/lib64/libcrypto.so.6)

[root@db-m2-slave-1 nagios_client]# service nrpe start

Starting nrpe: [确定]

[root@db-m2-slave-1 nagios_client]#

1.4 检测下 nrpe 是否正常运行：

去 nagios 服务器端 check 下

[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.xx.3.xx

NRPE v2.12

[root@cache-2 ~]#

看到返回 NRPE v2.15 表示已经连接成功，客户端的 nrpe 服务已经监控完成。

2，比较简单的通过 check_http 的方式监控

可以在 /etc/nagios/nrpe.cfg 里面采用 check_http 的方式来获取 nginx 是否运行：

(1) 编辑 nrpe.cfg

Vim /etc/nagios/nrpe.cfg

command[check_nginx_status]=/usr/lib/nagios/plugins/check_http -I localhost -p 80 -u /nginx_status -e 200 -w 3 -c 10

(2) 重启 nrpe 服务

[root@lb-net-2 ~]# service nrpe restart

Shutting down nrpe: [确定]

Starting nrpe: [确定]

[root@lb-net-2 ~]#

(3) 在 nagios 服务器端 check，成功。

[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H10.xx.1.22 -c check_nginx_status

HTTP OK HTTP/1.1 200 OK – 254 bytes in 0.002 seconds |time=0.002031s;3.000000;10.000000;0.000000 size=254B;;;0

(4) 在 services.cfg 里面添加 check_nginx_status 服务

define service{

host_name lb-net-2

service_description check_nginx_status

check_command check_nrpe!check_nginx_status

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24×7

notification_interval 10

notification_period 24×7

notification_options w,u,c,r

contact_groups opsweb

}

(5) 在 command.cfg 添加 check_nginx_status 服务

define command{

command_name check_nginx_status

command_line $USER1$/check_nginx_status -I $HOSTADDRESS$ -w $Warning$ -c $Cri$

}

(6) 重新加载 nagios

[root@cache-2 objects]# service nagios reload

Running configuration check…

Reloading nagios configuration…

done

[root@cache-2 objects]#

(7) 查看界面的 nginx 监控服务，如下所示：

Nagios 监控 Nginx 服务详细过程

更多详情见请继续阅读下一页的精彩内容：http://www.linuxidc.com/Linux/2014-07/104072p2.htm

3 编写脚本来监控 nginx 服务

3.1 调试详细经过

[root@lb-net-2 run]# find / -name nginx.pid

/usr/local/nginx/logs/nginx.pid

[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -n /usr/local/nginx/logs/nginx.pid -s nginx_status -o /tmp/ -w 1500 -c 2000

expr: 参数数目错误

expr: 语法错误

(standard_in) 1: syntax error

/usr/lib/nagios/plugins/check_nginxstatus: line 258: [: : integer expression expected

/usr/lib/nagios/plugins/check_nginxstatus: line 262: [: : integer expression expected

OK – nginx is running. requests per second, connections per second (requests per connection) | ‘reqpsec’= ‘conpsec’= ‘conpreq’= ]

去查看 262 行，将逻辑运算符 “-a” 改成 “&&”

[root@lb-net-2 run]# vim /usr/lib/nagios/plugins/check_nginxstatus

[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -n /usr/local/nginx/logs/nginx.pid -s nginx_status -o /tmp/ -w 1500 -c 2000

expr: 参数数目错误

expr: 语法错误

(standard_in) 1: syntax error

/usr/lib/nagios/plugins/check_nginxstatus: line 258: [: missing `]’

/usr/lib/nagios/plugins/check_nginxstatus: line 262: [: : integer expression expected

OK – nginx is running. requests per second, connections per second (requests per connection) | ‘reqpsec’= ‘conpsec’= ‘conpreq’= ]

[root@lb-net-2 run]#

看到已经 OK 了，再修改文件。

[root@lb-net-2 run]# vim /usr/lib/nagios/plugins/check_nginxstatus

[root@lb-net-2 run]#

[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -n /usr/local/nginx/logs/nginx.pid -s nginx_status -o /tmp/ -w 1500 -c 2000

expr: 参数数目错误

expr: 语法错误

(standard_in) 1: syntax error

/usr/lib/nagios/plugins/check_nginxstatus: line 258: [: missing `]’

OK – nginx is running. requests per second, connections per second (requests per connection) | ‘reqpsec’= ‘conpsec’= ‘conpreq’= ]

[root@lb-net-2 run]#

将 [] 改成使用 ”[[]]”，即可！

[root@lb-net-2 run]# vim /usr/lib/nagios/plugins/check_nginxstatus

[root@lb-net-2 run]#

[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -n /usr/local/nginx/logs/nginx.pid -s nginx_status -o /tmp/ -w 1500 -c 2000

expr: 参数数目错误

expr: 语法错误

(standard_in) 1: syntax error

OK – nginx is running. requests per second, connections per second (requests per connection) | ‘reqpsec’= ‘conpsec’= ‘conpreq’= ]

[root@lb-net-2 run]#

注释掉 #reqpcon=`echo “scale=2; $reqpsec / $conpsec” | bc -l` 之后，就不会报(standard_in) 1: syntax error 错误，如下所示：

[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -s nginx_status -n nginx.pid -w 15000 -c 20000

expr: 参数数目错误

expr: 语法错误

OK – nginx is running. requests per second, connections per second (requests per connection) | ‘reqpsec’= ‘conpsec’= ‘conpreq’= ]

[root@lb-net-2 run]#

注释掉# reqpsec=`expr $tmp2_reqpsec – $tmp1_reqpsec` 就不会再报 expr: 参数数目错误，如下所示：

报错：

[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -s nginx_status -n nginx.pid -w 15000 -c 20000

expr: 语法错误

OK – nginx is running. requests per second, connections per second (requests per connection) | ‘reqpsec’= ‘conpsec’= ‘conpreq’= ]

再次注释掉 #reqpcon=`echo “scale=2; $reqpsec / $conpsec” | bc -l` 后，运行不会报 expr: 语法错误，如下所示：

[root@lb-net-2 run]# /usr/lib/nagios/plugins/check_nginxstatus -H localhost -P 80 -p /usr/local/nginx/logs/ -s nginx_status -n nginx.pid -w 15000 -c 20000

OK – nginx is running. requests per second, connections per second (requests per connection) | ‘reqpsec’= ‘conpsec’= ‘conpreq’= ]

[root@lb-net-2 run]#

看到这里发现 ‘reqpsec’= ‘conpsec’= ‘conpreq’= 都没有值，但是 nginx 又是在启动运行着，问题出在哪里？经过排查，原来是 nginx_status 服务没有启动，需要在 /usr/local/nginx/conf/nginx.conf 配置文件里面添加如下配置：

# 添加 pid 参数

pid logs/nginx.pid;

#charset koi8-r;

access_log logs/host.access.log main;

location /nginx_status {

stub_status on;

access_log off;

deny all;

}

然后重新加载 nginx，看到新的nginx-status 文件是生成了，但是文件内容为空，如下所示：

[root@lb-net-2 logs]# ll /tmp/nginx*

-rw-r–r–. 1 root root 0 7月 3 15:06 /tmp/nginx-status.1

[root@lb-net-2 logs]#

去查看 ngins 后台日志

[root@lb-net-2 logs]# cd /usr/local/nginx/

[root@lb-net-2 logs]# tail -n 300 error.log

……

2014/07/03 15:05:47 [error] 4285#0: *1851293 access forbidden by rule, client: 127.0.0.1, server: localhost, request: “GET /nginx_status HTTP/1.0”, host: “localhost”

2014/07/03 15:05:48 [error] 4285#0: *1851294 access forbidden by rule, client: 127.0.0.1, server: localhost, request: “GET /nginx_status HTTP/1.0”, host: “localhost”

2014/07/03 15:06:12 [error] 4282#0: *1851362 access forbidden by rule, client: 127.0.0.1, server: localhost, request: “GET /nginx_status HTTP/1.0”, host: “localhost”

2014/07/03 15:06:13 [error] 4282#0: *1851363 access forbidden by rule, client: 127.0.0.1, server: localhost, request: “GET /nginx_status HTTP/1.0”, host: “localhost”

2014/07/03 15:06:55 [error] 4285#0: *1851509 access forbidden by rule, client: 127.0.0.1, server: localhost, request: “GET /nginx_status HTTP/1.0”, host: “localhost”

2014/07/03 15:06:56 [error] 4285#0: *1851519 access forbidden by rule, client: 127.0.0.1, server: localhost, request: “GET /nginx_status HTTP/1.0”, host: “localhost”

查看 nginx 编译参数

[root@lb-net-2 logs]# /usr/local/nginx/sbin/nginx -V

nginx version: nginx/1.4.2

built by gcc 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC)

configure arguments: –prefix=/usr/local/nginx –with-http_stub_status_module –with-http_realip_module

证明确实是加载了 stub_status 插件，之后去修改配置文件，注释掉 deny all; 重新加载 nginx。

[root@lb-net-2 logs]# vim /usr/local/nginx/conf/nginx.conf

#deny all;

[root@lb-net-2 logs]# service nginx reload

reload nginx

[root@lb-net-2 logs]#

[root@lb-net-2 logs]# ll /tmp/nginx*

ls: 无法访问 /tmp/nginx*: 没有那个文件或目录

[root@lb-net-2 logs]#

还是没有看到 /tmp/nginx-status.1 状态文件生成，因为 nagios 下监控 nginx 的脚本是从 nginx-status.1 获取数据，如果没有这个文件，没有办法获取数据。

继续 google，”nginx stub_status 没有生成 nginx-status.1” 文件，看到有人说只要配置好了这个状态文件有没有无所谓，我就试着直接运行脚本看看能否生效。

[root@lb-net-2 logs]# ll /tmp/nginx*

ls: 无法访问 /tmp/nginx*: 没有那个文件或目录

[root@lb-net-2 logs]# /root/check_nginx2.sh -H localhost -P 80 -p /usr/local/nginx/logs/ -n nginx.pid -s nginx_status -w 15000 -c 20000

OK – nginx is running. 1 requests per second, 2 connections per second (.50 requests per connection) | ‘reqpsec’=1 ‘conpsec’=2 ‘conpreq’=.50 ]

[root@lb-net-2 logs]#

看到 ‘reqpsec’=1 ‘conpsec’=2 ‘conpreq’=.50 里面有数据了，再去 check 下文件有没有生成，如下所示：

[root@lb-net-2 logs]# ll /tmp/nginx*

ls: 无法访问 /tmp/nginx*: 没有那个文件或目录

[root@lb-net-2 logs]#

还是没有文件生成，但是 check 已经有数据了，证明不一定要拘泥于是否在 /tmp/ 目录下是否有 nginx-status.1 文件。通过脚本分析如下：

[root@lb-net-2 logs]# vim /usr/lib/nagios/plugins/check_nginxstatus

180 get_status() {

181 if [“$secure” = 1]

182 then

183 wget_opts=”-O- -q -t 3 -T 3 –no-check-certificate”

184 out1=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`

185 sleep 1

186 out2=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`

187 else

188 wget_opts=”-O- -q -t 3 -T 3″

189 out1=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`

190 sleep 1

191 out2=`wget ${wget_opts} http://${hostname}:${port}/${status_page}`

192 fi

193

194 if [-z “$out1” -o -z “$out2”]

195 then

196 echo “UNKNOWN – Local copy/copies of $status_page is empty.”

197 exit $ST_UK

198 fi

199 }

是通过访问 `wget -O- -q -t 3 -T 3 –no-check-certificate http://10.xx.xx.xx:80/nginx_status` 这个链接来获取 status 的数据记录的，而不是去加载 /tmp/nginx-status.1 文件来获取数据的。直接访问 http://10.xx.xx.xx:80/nginx_status 地址就能获取 nginx 运行数据，如下图所示：
Nagios 监控 Nginx 服务详细过程

在 nagios 服务器上 check 下，报错：

[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H10.xx.xx.xx -c check_nginx_status

UNKNOWN – Local copy/copies of nginx_status is empty.

[root@cache-2 ~]#

检查监控脚本，搜索‘Local copy/copies of nginx_status is empty.’在第 197 行，有如下代码：

195 if [-z “$out1” -o -z “$out2”]

196 then

197 echo “UNKNOWN – Local copy/copies of $status_page is empty.”

198 exit $ST_UK

199 fi

看出是由于 if [-z “$out1” -o -z “$out2”] 这个判断生效，导致监控脚本运行到这里就 exit 了。继续调试，发现用 nagios 服务器调用脚本的时候，执行到以下第 190 行到第 192 行

out1=`/usr/bin/wget ${wget_opts} http://${hostname}:${port}/${status_page}`

sleep 1

out2=`/usr/bin/wget ${wget_opts} http://${hostname}:${port}/${status_page}`

的时候，out1为空，out2也为空，所以在后面的 if [-z “$out1” -o -z “$out2”] 判断通过报出信息为：UNKNOWN – Local copy/copies of $status_page is empty. 然后直接exit。

说明：由于 nginx 是要调用 wget 命令来获取 nginx_status 状态的，而 wget 命令是只能以 root 用户来运行的 , 所以需要将 nagios 用户设置成可以无需密码直接 su 成root，这样就能以 nagios 用户运行命令 sudo /usr/lib/nagios/plugins/check_nginxstatus 。在CentOS 系统中，无法直接调用 sudo 命令，需要修改 /etc/sudoers, 找到 #Defaults requiretty 并取消注释，另外新增一行。表示 nagios 用户不需要登陆终端就可以调用命令，如下所示：

Defaults requiretty

Defaults:nagios !requiretty

#添加 nagios 请求sudo，允许特定指令时（可跟参数），不需要密码（如）。

nagios ALL=(ALL) NOPASSWD: ALL

修改完后，再check，数据出来了：

[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H10.xx.xx.xx -c check_nginx_status

OK – nginx is running. 1 requests per second, 1 connections per second (1.00 requests per connection) | ‘reqpsec’=1 ‘conpsec’=1 ‘conpreq’=1.00 ]

[root@cache-2 ~]#

CentOS 6.2 实战部署 Nginx+MySQL+PHP http://www.linuxidc.com/Linux/2013-09/90020.htm

使用 Nginx 搭建 WEB 服务器 http://www.linuxidc.com/Linux/2013-09/89768.htm

搭建基于 Linux6.3+Nginx1.2+PHP5+MySQL5.5 的 Web 服务器全过程 http://www.linuxidc.com/Linux/2013-09/89692.htm

CentOS 6.3 下 Nginx 性能调优 http://www.linuxidc.com/Linux/2013-09/89656.htm

CentOS 6.3 下配置 Nginx 加载 ngx_pagespeed 模块 http://www.linuxidc.com/Linux/2013-09/89657.htm

CentOS 6.4 安装配置 Nginx+Pcre+php-fpm http://www.linuxidc.com/Linux/2013-08/88984.htm

1 在 nginx 服务器上安装 nrpe 客户端：

Nginx 的服务需要监控起来，不然万一 down 了而不及时修复，会影响 web 应用，如下 web 应用上面启动的 nginx 后台进程
[root@lb-net-2 ~]# ps aux|grep nginx
nobody 15294 0.0 0.0 22432 3464 ? S Jul03 0:05 nginx: worker process
nobody 15295 0.0 0.0 22432 3480 ? S Jul03 0:05 nginx: worker process
……
nobody 15316 0.0 0.0 22432 3468 ? S Jul03 0:05 nginx: worker process
nobody 15317 0.0 0.0 22432 3480 ? S Jul03 0:05 nginx: worker process
root 16260 0.0 0.0 20584 1684 ? Ss Jun18 0:00 nginx: master process /usr/local/nginx/sbin/nginx
root 21211 0.0 0.0 103252 860 pts/1 S+ 17:50 0:00 grep nginx

网络监控器 Nagios 全攻略 http://www.linuxidc.com/Linux/2013-07/87067.htm

Nagios 搭建与配置详解 http://www.linuxidc.com/Linux/2013-05/84848.htm

Nginx 环境下构建 Nagios 监控平台 http://www.linuxidc.com/Linux/2011-07/38112.htm

在 RHEL5.3 上配置基本的 Nagios 系统(使用 Nagios-3.1.2) http://www.linuxidc.com/Linux/2011-07/38129.htm

CentOS 5.5+Nginx+Nagios 监控端和被控端安装配置指南 http://www.linuxidc.com/Linux/2011-09/44018.htm

Ubuntu 13.10 Server 安装 Nagios Core 网络监控运用 http://www.linuxidc.com/Linux/2013-11/93047.htm

1.1，rpm 方式安装 nrpe 客户端

下载地址：http://download.csdn.net/detail/mchdba/7493875

[root@localhost nagios]# ll

总计 768

-rw-r–r– 1 root root 713389 12-16 12:08 nagios-plugins-1.4.11-1.x86_64.rpm

-rw-r–r– 1 root root 32706 12-16 12:09 nrpe-2.12-1.x86_64.rpm

-rw-r–r– 1 root root 18997 12-16 12:08 nrpe-plugin-2.12-1.x86_64.rpm

[root@localhost nagios]# rpm -ivh *.rpm –nodeps –force

1.2 在配置文件最末尾，添加配置信息以及监控主机服务器 ip 地址

[root@ localhost nagios]# vim /etc/nagios/nrpe.cfg

# add by tim on 2014-06-11

command[check_users]=/usr/local/nagios/libexec/check_users -w 8 -c 15

command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20

command[check_sda1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda

command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z

#command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 50 -c 80

command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 750 -c 800

command[check-host-alive]=/usr/local/nagios/libexec/check_ping -H 10.xx.3.29 -w 3000.0,80% -c 5000.0,100% -p 5

allowed_hosts = 127.0.0.1,10.xx.3.41

check 下命令是否生效：

[root@web-9 nrpe-2.15]# /usr/local/nagios/libexec/check_users -w 8 -c 15

USERS OK – 2 users currently logged in |users=2;8;15;0

[root@web-9 nrpe-2.15]#

看到已经 USERS OK -…. 命令已经生效。

1.3 启动 nrpe 报错如下：

[root@web-9 ~]# service nrpe restart

Shutting down nrpe: [失败]

Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory

[失败]

[root@web-9 ~]#

[root@db-m2-slave-1 nagios_client]# service nrpe start

Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libssl.so.6: cannot open shared object file: No such file or directory

[失败]

[root@db-m2-slave-1 nagios_client]#

建立连接

[root@db-m2-slave-1 nagios_client]# ln -s /usr/lib64/libssl.so /usr/lib64/libssl.so.6

(如果没有 libssl.so，就采用别的 libssl.so.10 来做软连接，ln -s /usr/lib64/libssl.so.10 /usr/lib64/libssl.so.6)

[root@db-m2-slave-1 nagios_client]#

再重新启动如下：

[root@db-m2-slave-1 nagios_client]# service nrpe start

Starting nrpe: /usr/sbin/nrpe: error while loading shared libraries: libcrypto.so.6: cannot open shared object file: No such file or directory

[失败]

[root@web-10 ~]# ll /usr/lib64/libcrypto.so

lrwxrwxrwx. 1 root root 18 10 月 13 2013 /usr/lib64/libcrypto.so -> libcrypto.so.1.0.0

[root@db-m2-slave-1 nagios_client]#

再建链接：
[root@db-m2-slave-1 nagios_client]# ln -s /usr/lib64/libcrypto.so /usr/lib64/libcrypto.so.6

(或者如果没有 libcrypto.so，就采用 libcrypto.so.10 做软连接，ln -s /usr/lib64/libcrypto.so.10 /usr/lib64/libcrypto.so.6)

[root@db-m2-slave-1 nagios_client]# service nrpe start

Starting nrpe: [确定]

[root@db-m2-slave-1 nagios_client]#

1.4 检测下 nrpe 是否正常运行：

去 nagios 服务器端 check 下

[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H 10.xx.3.xx

NRPE v2.12

[root@cache-2 ~]#

看到返回 NRPE v2.15 表示已经连接成功，客户端的 nrpe 服务已经监控完成。

2，比较简单的通过 check_http 的方式监控

可以在 /etc/nagios/nrpe.cfg 里面采用 check_http 的方式来获取 nginx 是否运行：

(1) 编辑 nrpe.cfg

Vim /etc/nagios/nrpe.cfg

command[check_nginx_status]=/usr/lib/nagios/plugins/check_http -I localhost -p 80 -u /nginx_status -e 200 -w 3 -c 10

(2) 重启 nrpe 服务

[root@lb-net-2 ~]# service nrpe restart

Shutting down nrpe: [确定]

Starting nrpe: [确定]

[root@lb-net-2 ~]#

(3) 在 nagios 服务器端 check，成功。

[root@cache-2 ~]# /usr/local/nagios/libexec/check_nrpe -H10.xx.1.22 -c check_nginx_status

HTTP OK HTTP/1.1 200 OK – 254 bytes in 0.002 seconds |time=0.002031s;3.000000;10.000000;0.000000 size=254B;;;0

(4) 在 services.cfg 里面添加 check_nginx_status 服务

define service{

host_name lb-net-2

service_description check_nginx_status

check_command check_nrpe!check_nginx_status

max_check_attempts 5

normal_check_interval 3

retry_check_interval 2

check_period 24×7

notification_interval 10

notification_period 24×7

notification_options w,u,c,r

contact_groups opsweb

}

(5) 在 command.cfg 添加 check_nginx_status 服务

define command{

command_name check_nginx_status

command_line $USER1$/check_nginx_status -I $HOSTADDRESS$ -w $Warning$ -c $Cri$

}

(6) 重新加载 nagios

[root@cache-2 objects]# service nagios reload

Running configuration check…

Reloading nagios configuration…

done

[root@cache-2 objects]#

(7) 查看界面的 nginx 监控服务，如下所示：

Nagios 监控 Nginx 服务详细过程

更多详情见请继续阅读下一页的精彩内容：http://www.linuxidc.com/Linux/2014-07/104072p2.htm

3.2 share下 check_nginxstatus 脚本

#!/bin/sh
PROGNAME=`basename $0`
VERSION=\\\“Version 1.1,\\\”
AUTHOR=\\\“tim man\\\”
ST_OK=0
ST_WR=1
ST_CR=2
ST_UK=3
hostname=\\\“localhost\\\”
port=80
path_pid=/var/run
name_pid=\\\“nginx.pid\\\”
status_page=\\\“nginx_status\\\”
pid_check=1
secure=0
print_version() {
echo \\\“$VERSION $AUTHOR\\\”
}
print_help() {
print_version $PROGNAME $VERSION
echo \\\“\\\”
echo \\\“$PROGNAME is a Nagios plugin to check whether nginx is running.\\\”
echo \\\“It also parses the nginx\\\’s status page to get requests and\\\”
echo \\\“connections per second as well as requests per connection. You\\\”
echo \\\“may have to alter your nginx configuration so that the plugin\\\”
echo \\\“can access the server\\\’s status page.\\\”
echo \\\“The plugin is highly configurable for this reason. See below for\\\”
echo \\\“available options.\\\”
echo \\\“\\\”
echo \\\“$PROGNAME -H localhost -P 80 -p /var/run -n nginx.pid \\\”
echo \\\” -s nginx_statut -o /tmp [-w INT] [-c INT] [-S] [-N]\\\”
echo \\\“\\\”
echo \\\“Options:\\\”
echo \\\” -H/–hostname)\\\”
echo \\\” Defines the hostname. Default is: localhost\\\”
echo \\\” -P/–port)\\\”
echo \\\” Defines the port. Default is: 80\\\”
echo \\\” -p/–path-pid)\\\”
echo \\\” Path where nginx\\\’s pid file is being stored. You might need\\\”
echo \\\” to alter this path according to your distribution. Default\\\”
echo \\\” is: /var/run\\\”
echo \\\” -n/–name_pid)\\\”
echo \\\” Name of the pid file. Default is: nginx.pid\\\”
echo \\\” -N/–no-pid-check)\\\”
echo \\\” Turn this on, if you don\\\’t want to check for a pid file\\\”
echo \\\” whether nginx is running, e.g. when you\\\’re checking a\\\”
echo \\\” remote server. Default is: off\\\”
echo \\\” -s/–status-page)\\\”
echo \\\” Name of the server\\\’s status page defined in the location\\\”
echo \\\” directive of your nginx configuration. Default is:\\\”
echo \\\” nginx_status\\\”
echo \\\” -S/–secure)\\\”
echo \\\” In case your server is only reachable via SSL, use this\\\”
echo \\\” this switch to use HTTPS instead of HTTP. Default is: off\\\”
echo \\\” -w/–warning)\\\”
echo \\\” Sets a warning level for requests per second. Default is: off\\\”
echo \\\” -c/–critical)\\\”
echo \\\” Sets a critical level for requests per second. Default is:\\\”
echo \\\” off\\\”
exit $ST_UK
}
while test –n \\\“$1\\\”; do
case \\\“$1\\\” in
–help|–h)
print_help
exit $ST_UK
;;
––version|–v)
print_version $PROGNAME $VERSION
exit $ST_UK
;;
––hostname|–H)
hostname=$2
shift
;;
––port|–P)
port=$2
shift
;;
––path–pid|–p)
path_pid=$2
shift
;;
––name–pid|–n)
name_pid=$2
shift
;;
––no–pid–check|–N)
pid_check=0
;;
––status–page|–s)
status_page=$2
shift
;;
––secure|–S)
secure=1
;;
––warning|–w)
warning=$2
shift
;;
––critical|–c)
critical=$2
shift
;;
*)
echo \\\“Unknown argument: $1\\\”
print_help
exit $ST_UK
;;
esac
shift
done
get_wcdiff() {
if [ ! –z \\\“$warning\\\” –a ! –z \\\“$critical\\\” ]
then
wclvls=1
if [ ${warning} –ge ${critical} ]
then
wcdiff=1
fi
elif [ ! –z \\\“$warning\\\” –a –z \\\“$critical\\\” ]
then
wcdiff=2
elif [ –z \\\“$warning\\\” –a ! –z \\\“$critical\\\” ]
then
wcdiff=3
fi
}
val_wcdiff() {
if [ \\\“$wcdiff\\\” = 1 ]
then
echo \\\“Please adjust your warning/critical thresholds. The warning \\\\
must be lower than the critical level!\\\”
exit $ST_UK
elif [ \\\“$wcdiff\\\” = 2 ]
then
echo \\\“Please also set a critical value when you want to use \\\\
warning/critical thresholds!\\\”
exit $ST_UK
elif [ \\\“$wcdiff\\\” = 3 ]
then
echo \\\“Please also set a warning value when you want to use \\\\
warning/critical thresholds!\\\”
exit $ST_UK
fi
}
check_pid() {
if [ –f \\\“$path_pid/$name_pid\\\” ]
then
retval=0
else
retval=1
fi
}
get_status() {
if [ \\\“$secure\\\” = 1 ]
then
wget_opts=\\\“-O- -q -t 3 -T 3 –no-check-certificate\\\”
#out1=`/usr/bin/wget ${wget_opts} http://${hostname}:${port}/${status_page}`
out1=`/usr/bin/wget –O– –q –t 3 –T 3 http://localhost:80/nginx_status`
sleep 1
out2=`/usr/bin/wget –O– –q –t 3 –T 3 http://localhost:80/nginx_status`
else
wget_opts=\\\“-O- -q -t 3 -T 3\\\”
out1=`/usr/bin/wget –O– –q –t 3 –T 3 http://localhost:80/nginx_status`
sleep 1
out2=`/usr/bin/wget –O– –q –t 3 –T 3 http://localhost:80/nginx_status`
fi
if [ –z \\\“$out1\\\” –o –z \\\“$out2\\\” ]
then
echo \\\“out1:$out1 out2:$out2, UNKNOWN – Local copy/copies of $status_page is empty.\\\”
exit $ST_UK
fi
}
get_vals() {
tmp1_reqpsec=`echo ${out1}|awk \\\‘{print $10}\\\’`
tmp2_reqpsec=`echo ${out2}|awk \\\‘{print $10}\\\’`
reqpsec=`expr $tmp2_reqpsec – $tmp1_reqpsec`
tmp1_conpsec=`echo ${out1}|awk \\\‘{print $9}\\\’`
tmp2_conpsec=`echo ${out2}|awk \\\‘{print $9}\\\’`
conpsec=`expr $tmp2_conpsec – $tmp1_conpsec`
reqpcon=`echo \\\“scale=2; $reqpsec / $conpsec\\\” | bc –l`
if [ \\\“$reqpcon\\\” = \\\“.99\\\” ]
then
reqpcon=\\\“1.00\\\”
fi
}
do_output() {
output=\\\“nginx is running. $reqpsec requests per second, $conpsec connections per second ($reqpcon requests per connection)\\\”
}
do_perfdata() {
perfdata=\\\“\\\’reqpsec\\\’=$reqpsec \\\’conpsec\\\’=$conpsec \\\’conpreq\\\’=$reqpcon\\\”
}
# Here we
get_wcdiff
val_wcdiff
if [ ${pid_check} = 1 ]
then
check_pid
if [ \\\“$retval\\\” = 1 ]
then
echo \\\“There\\\’s no pid file for nginx. Is nginx running? Please also make sure whether your pid path and name is correct.\\\”
exit $ST_CR
fi
fi
get_status
get_vals
do_output
do_perfdata
if [[ –n \\\“$warning\\\” ]] && [[ –n \\\“$critical\\\” ]]
then
if [[ \\\“$reqpsec\\\” –ge \\\“$warning\\\” ]] && [[ \\\“$reqpsec\\\” –lt \\\“$critical\\\” ]]
then
echo \\\“WARNING – ${output} | ${perfdata}\\\”
exit $ST_WR
elif [ \\\“$reqpsec\\\” –ge \\\“$critical\\\” ]
then
echo \\\“CRITICAL – ${output} | ${perfdata}\\\”
exit $ST_CR
else
echo \\\“OK – ${output} | ${perfdata} ]\\\”
exit $ST_OK
fi
else
echo \\\“OK – ${output} | ${perfdata}\\\”
exit $ST_OK
fi