共计 36213 个字符,预计需要花费 91 分钟才能阅读完成。
概述:公司的生产机器一共有 12 台,2 台 LVS(主备)、2 台 nginx、2 台 tomcat、1 台后台服务器 (nginx_tomcat)、3 台 mysql(主 + 备 + 异地灾备)、1 台图片服务器、2 台 memcached.
可以看出网站的架构就是基于高可用的原理的,每个层面都做了主备、系统的 PV 不高,对于并发布,高性能没有那么苛求,对于系统安全、稳定有较高要求,前期已经对系统做了各种日志分析,WAF 配置,漏洞扫面等等,现在还需要对系统进行监控,考虑再三还是决定使用 Nagios 来做。
PS:之前的同事用的 Zabbix,表示我这十几台机器真是伤不起。。。
照着网上的材料来做,有的地方实在是坑。。把自己整理出来的结果发出来,给大家做个参考
————————————– 分割线 ————————————–
在 Ubuntu 下配置 Mrtg 监控 Nginx 和服务器系统资源 http://www.linuxidc.com/Linux/2013-08/88417.htm
使用 snmp+Mrtg 监控 Linux 系统 http://www.linuxidc.com/Linux/2012-11/73561.htm
Mrtg 服务器搭建(监控网络流量)http://www.linuxidc.com/Linux/2012-07/64315.htm
网络监控器 Nagios 全攻略 http://www.linuxidc.com/Linux/2013-07/87067.htm
Nagios 搭建与配置详解 http://www.linuxidc.com/Linux/2013-05/84848.htm
Nginx 环境下构建 Nagios 监控平台 http://www.linuxidc.com/Linux/2011-07/38112.htm
在 RHEL5.3 上配置基本的 Nagios 系统(使用 Nagios-3.1.2) http://www.linuxidc.com/Linux/2011-07/38129.htm
CentOS 5.5+Nginx+Nagios 监控端和被控端安装配置指南 http://www.linuxidc.com/Linux/2011-09/44018.htm
Ubuntu 13.10 Server 安装 Nagios Core 网络监控运用 http://www.linuxidc.com/Linux/2013-11/93047.htm
————————————– 分割线 ————————————–
一、安装 Nagios
1、安装依赖包
#rpm -q gcc glibc glibc-common gd gd-devel xinetd openssl-devel
#yum install -y gcc glibc glibc-common gd gd-devel xinetd openssl-devel
#yum – y install httpd php mysql-devel php-mysql
2、添加用户和组
#groupadd nagcmd
#useradd -G nagcmd nagios
#passwd nagios
#usermod -a -G nagcmd apache
3、编译安装
#tar nagios-3.4.3.tar.gz
#cd nagios
#./configure –sysconfdir=/etc/nagios –with-command-group=nagcmd –enable-event-broker
#make all
#make install
#make install-init
#make install-commandmode
#make install-config
在 http 的配置文件目录【conf.d】中创建 nagios 的 web 程序配置文件
#make install-webconf
创建一个登陆 nagios web 程序的用户,用这个账号登陆 nagios(这是彻底的弱口令,配置完建议把密码修改掉)
#htpasswd -c /etc/nagios/htpasswd.users nagiosadmin
# 密码:nagios
以上配置过程需重新启动 httpd:
service httpd restart
报错信息:Could not reliably determine the server’s fully qualified
vi /etc/httpd/conf/httpd.conf
加入:ServerName localhost:80
4、安装 nagios-plugins
#tar zxvf nagios-plugins-1.4.13.tar.gz
#cd nagios-plugins-1.4.13
注意:组不使用 nagcmd
#./configure –with-nagios-user=nagios –with-nagios-group=nagios
#make all
#make install
5. 配置并启动 nagios
(1)加入开机启动 –
# chkconfig –add nagios<BR># chkconfig –level 35 nagios on<BR># chkconfig –list nagios
(2)检查其配置文件的语法是否正确
#/usr/local/nagios/bin/nagios -v /etc/nagios/nagios.cfg
(3)启动 nagios
#service nagios restart
(4)配置 selinux【会阻止 CGI 脚本】
1 <SPAN style=”FONT-SIZE: 14px”>#getenforce<BR>#setenfore 0<BR>#vi /etc/sysconfig/selinux ->SELINUX=disabled</SPAN>
二、Nagios 配置
这里只做简要说明,后续会贴出具体的配置
cgi.cfg
控制 CGI 访问的配置文件,如何新加了 cgi 配置文件,需要在这里增加
nagios.cfg
Nagios 主配置文件
resource.cfg
变量定义文件,又称为资源文件,在些文件中定义变量,以便由其他配置文件引用,如 $USER1$,好吧,其实就就是全局变量
objects
objects 是一个目录,在此目录下有很多配置文件模板,用于定义 Nagios 对象
objects/commands.cfg
命令定义配置文件,其中定义的命令可以被其他配置文件引用
objects/contacts.cfg
定义联系人和联系人组的配置文件
objects/localhost.cfg
定义监控本地主机的配置文件
objects/printer.cfg
定义监控打印机的一个配置文件模板,默认没有启用此文件
objects/switch.cfg
定义监控路由器的一个配置文件模板,默认没有启用此文件
objects/templates.cfg
定义主机和服务的一个模板配置文件,可以在其他配置文件中引用
objects/timeperiods.cfg
定义 Nagios 监控时间段的配置文件
objects/windows.cfg
监控 Windows 主机的一个配置文件模板,默认没有启用此文件
三、NRPE 安装【客户端】
说明:NRPE(nagios remore plugin execute)远程插件执行器,用于在远端服务区上运行监测命令的守护进程,它用于让 nagios 监控端基于安装的方式出发远端主机上的检测命令,并将检测结果输出至监控端。而其执行的开销远低于基于 ssh 的检测方式,而且检测过程中并不需要远程主机上的系统账号等信息。必须在客户端安装 nrpe 的 nagios 的 plugin
1、安装 plugin
#useradd -s /sbin/nologin nagios
#yum grouplist
#yum -y groupinstall “Development Tools” “Development Libraries”
#tar zxvf nagios-plugins-1.4.13.tar.gz
#cd nagios-plugins-1.4.13
#./configure –with-nagios-user=nagios –with-nagios-group=nagios
#make all
#make install
2、安装 nrpe
#tar zxvf nrpe-2.15.tar.gz
#cd nrpe-2.15
#./configure –with-nrpe-user=nagios -with-nrpe-group=nagios –with-nagios-user=nagios –with-nagios-group=nagios –enable-command-args –enable-ssl
#make all
#make install-plugin
#make install-daemon
#make install-daemon-config
3、配置 NRPE
# vi /usr/local/nagios/etc/nrpe.cfg
log_facility=daemon
pid_file=/var/run/nrpe.pid
server_port=5666
# 修改为本机的 IP
server_address=192.168.1.101
nrpe_user=nagios
nrpe_group=nagios
# 修改为 Nagios 服务端的 IP
allowed_hosts=192.168.1.100
command_timeout=60
4、启动 nrpe
# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
为了方便 NRPE 的启动,可以将如下内容定义为 /etc/init.d/nrped 脚本
#!/bin/bash
NRPE=/usr/local/nagios/bin/nrpe
NRPECONF=/usr/local/nagios/etc/nrpe.cfg
case “$1” in
start)
echo -n “Staring NRPE daemon….”
$NRPE -c $NRPECONF -d
echo “done..”
;;
stop)
echo -n “Stopping NRPE daemon….”
pkill -u nagios nrpe
echo “done..”
;;
restart)
$0 stop
sleep 1
$0 start
;;
*)
echo “Usage: $0 start|stop|restart”
esac
exit 0
5、配置示例
vi /usr/local/nagios/etc/nrpe.cfg
command[check_users]=/usr/local/nagios/libexec/check_users -w 10 -c 20
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sd1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda6
command[check_sd2]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda3
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 200 -c 400
command[check_cpu]=/usr/local/nagios/libexec/check_cpu.sh -w 50 -c 80
command[check_mem]=/usr/local/nagios/libexec/check_mem.sh -w 50 -c 80
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
四、NRPE 服务端
1、安装 NRPE
#tar zxvf nrpe-2.15.tar.gz
#cd nrpe-2.15
#./configure –with-nrpe-user=nagios -with-nrpe-group=nagios –with-nagios-user=nagios –with-nagios-group=nagios –enable-command-args –enable-ssl –with-mysql
#make all
#make install-plugin
2、定义如何监控远程主机及服务
通过 NPRE 监控远程 Linux 主机要使用 check_nrpe 插件进行,其语法格式如下:
check_nrpe -H <host> [-n] [-u] [-p <port>] [-t <timeout>] [-c <command>] [-a <arglist…>]
示例:
define command
{
command_name check_swap_nrpe
command_line $USER1$check_nrpe -H “$HOSTADDRESS$” -c “check_swap”
}
如果还希望在监控远程 LINUX 主机时还能向其传递参数,则可以使用类似如下方式进行:
#cd /etc/nagios/objects/
#vi commands.cfg \\ 增加以下内容
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
贴出一个新增加的配置:
define host{
use linux-server
host_name linhost
alias My Linux Host
address 192.168.1.101
}
define service{
use generic-service
host_name linhost
service_description CHECK USERS
check_command check_nrpe!check_users
}
define service{
use generic-service
host_name linhost
service_description Load
check_command check_nrpe!check_load
}
define service{
use generic-service
host_name linhost
service_description SDA1
check_command check_nrpe!check_sd1
}
define service{
use generic-service
host_name linhost
service_description SDA2
check_command check_nrpe!check_sd2
}
define service{
use generic-service
host_name linhost
service_description Zombie
check_command check_nrpe!check_zombie_procs
}
define service{
use generic-service
host_name linhost
service_description total procs
check_command check_nrpe!check_total_procs
}
五、增加监控脚本
比如 CPU、内存、LVS 等、需要自己写脚本来做、其实 so easy,只要注意 2 个点就 OK,控制输入(参数等)、格式化输出。只要输出格式符合 Nagios 的格式识别方式就行
1、CPU 监控
vi check_cpu.sh
#!/bin/sh
# Filename: check_cpu.sh
procinfo=`which procinfo 2>/dev/null`
sar=`which sar 2>/dev/null`
function help {
echo -e “\n\tThis plugin shows the % of used CPU, using either procinfo or sar (whichever is available)\n\n\t$0:\n\t\t-c <integer>\tIf the % of used CPU is above <integer>, returns CRITICAL state\n\t\t-w <integer>\tIf the % of used CPU is below CRITICAL and above <integer>, returns WARNING state\n”
exit -1
}
# Getting parameters:
while getopts “w:c:h” OPT; do
case $OPT in
“w”) warning=$OPTARG;;
“c”) critical=$OPTARG;;
“h”) help;;
esac
done
# Checking parameters:
([ “$warning” == “”] || [“$critical” == “”] ) && echo “ERROR: You must specify warning and critical levels” && help
[[“$warning” -ge “$critical”]] && echo “ERROR: critical level must be highter than warning level” && help
# Assuring that the needed tools exist:
(( [ -f $procinfo] && command=”procinfo”) || [-f $sar] ) || \
(echo “ERROR: You must have either procinfo or sar installer in order to run this plugin” && exit -1)
# Doing the actual check:
([ “$command” == “procinfo”] && idle=`$procinfo | grep idle | cut -d% -f1 | awk ‘{print $NF}’ | cut -d. -f1`) || \
idle=`$sar | tail -1 | awk ‘{print $8}’ | cut -d. -f1`
used=`expr 100 – $idle`
# Comparing the result and setting the correct level:
if [[$used -ge $critical]]; then
msg=”CRITICAL”
status=2
else if [[$used -ge $warning]]; then
msg=”WARNING”
status=1
else
msg=”OK”
status=0
fi
fi
# Printing the results:
echo “$msg – CPU used=$used% idle=$idle% | ‘CPU Usage’=$used%;$warning;$critical;”
# Bye!
exit $status
修改用户数组和加权限,以下操作都一样
#chown nagios.nagios check_cpu.sh
#chmod +x check_cpu.sh
#./check_cpu.sh -w 60 -c 80
【问题】由于使用 sar 命令监控系统资源使用,有可能存在系统没有安装 sar 的情况
解决方案:
#yum -y install sysstat
初次执行的时候会存在问题 需要建立一个存放记录的文件【当天日期】sar -o 16
在被监控端也需要配置【略】
【注意】需要加入 crontab 每天生成记录 cpu 命令的文件
#crontab -e 记得检查 crontab 任务是否启动
1 0 * * * /usr/lib64/sa/sa1
2、内存监控
vi check_mem.sh
#!/bin/bash
#DESC: OS mem check
#Author:James
function help {
echo -e “\n\tThis plugin shows the % of used MEM, using free (whichever is available)\n\n\t$0:\n\t\t-c <integer>\tIf the % of used MEM is above <integer>, returns CRITICAL state\n\t\t-w <integer>\tIf the % of used MEM is below CRITICAL and above <integer>, returns WARNING state\n”
exit -1
}
while getopts “w:c:h” OPT; do
case $OPT in
“w”) warning=$OPTARG;;
“c”) critical=$OPTARG;;
“h”) help;;
esac
done
set `free|head -2|tail -1`
MEMTOTAL=$2
MEMUSED=$3
MEMFREE=$4
MEMBUFFERS=$6
MEMCACHED=$7
REALMEMUSED=`echo $MEMUSED – $MEMBUFFERS – $MEMCACHED | bc`
USEPCT=`echo “scale=3; $REALMEMUSED / $MEMTOTAL * 100” |bc -l`
REALMEMUSEDmb=`echo “($REALMEMUSED)/1024” | bc`
MEMTOTALMB=`echo “($MEMTOTAL)/1024″|bc`
if [`echo “$USEPCT > $critical” |bc` == 1];then
echo “MEM CRITICAL – Memory usage = ${USEPCT}%,MEMTOTAL=${MEMTOTALMB}MB,RealUsed=${REALMEMUSEDmb}MB |Used=${USEPCT}%;$warning;$critical”
exit 2
elif [`echo “$USEPCT > $warning” |bc` == 1];then
echo “MEM WARNING – Memory usage = ${USEPCT}%,MEMTOTAL=${MEMTOTALMB}MB,RealUsed=${REALMEMUSEDmb}MB |Used=${USEPCT}%;$warning;$critical”
exit 1
elif [`echo “$USEPCT < $warning” |bc` == 1];then
echo “MEM OK – Memory usage = ${USEPCT}%,MEMTOTAL=${MEMTOTALMB}MB,RealUsed=${REALMEMUSEDmb}MB|Used=${USEPCT}%;$warning;$critical”
exit 0
else
echo “MEM ERROR – Unable to determine memory usage”
exit 3
fi
echo “Unable to determine memory usage.”
exit 3
3、LVS 监控
vi check_lvs.sh
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647 #!/bin/bash
USAGE_Method=”$(basename $0)[-h|–hostname] <Free ip or hostname> [-w|–warning] <Free integer> [-c|–critical] <Free integer>”
USAGE_Value=”warning value must be small than critical value: `basename $0` $*”
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
if [$# -lt 4];then
echo “Usage:$USAGE_Method”
fi
while [$# -gt 0];
do
case “$1” in
-w|–warning)
shift
warning=$1
;;
-c|–critical)
shift
critical=$1
;;
esac
shift
done
if [[$warning == $critical || $warning -gt $critical]];then
#echo $warning
#echo $critical
echo “$USAGE_Value”
echo “Usage: $USAGE_Method”
exit 0
fi
ACT_COUNT=0
Inactive_count=0
stat1=`sudo ipvsadm | grep http | grep Route|wc -l`
if [$stat1 -ne 0];then
for NUM in `sudo ipvsadm | grep http | grep Route | awk ‘{print $5}’`
do
ACT_COUNT=$(($ACT_COUNT+ $NUM))
done
for NUM in `sudo ipvsadm | grep http | grep Route | awk ‘{print $6}’`
do
Inactive_count=$(($Inactive_count+ $NUM))
done
else
echo ” stat1:$stat1, lvs critical,lvs is down now.”
exit 3
fi
4、MYSQL 监控
在需要监控的 mysql 数据库上建一个专门给 Nagios 使用的库
mysql>create database nagdb default CHARSET=utf8;
mysql> grant select on nagdb.* to ‘nagios’@’192.168.1.100’;
mysql> update mysql.user set Password = PASSWORD(‘nagios’) where user=’nagios’;
#/usr/local/nagios/libexec/check_mysql -H 192.168.1.101 -u nagios -d nagdb -p nagios -w 10 -c 30
5、memcached 监控
使用插件,用 perl 语言写的,需要安装多个依赖包,比较坑爹。。我也不容易啊
(1)安装模块
#yum -y install perl-Carp-Clan perl-Cache-Memcached perl-Nagios-Plugin
– 如果不能安装
#wget http://dag.wieers.com/rpm/packages/rpmforge-release/rpmforge-release-0.5.2-2.rf.src.rpm
#rpm -ivh rpmforge-release-0.5.2-2.rf.src.rpm
#yum -y install perl-Nagios-Plugin.noarch perl-Carp-Clan.noarch perl-Cache-Memcached.noarch
– 如果 perl-Nagios-Plugin 无法安装
wget http://packages.sw.be/perl-Nagios-Plugin/perl-Nagios-Plugin-0.33-1.el5.rf.noarch.rpm
rpm -ivh perl-Nagios-Plugin-0.33-1.el5.rf.noarch.rpm –force –nodeps
(2)插件安装
下载 Nagios-Plugins-Memcached-0.02.tar.gz 后安装【依赖包较多,请注意查看.pm 文件的存放位置】
#tar xzvf Nagios-Plugins-Memcached-0.02.tar.gz
#cd Nagios-Plugins-Memcached-0.02
#yum -y install perl-CPAN
# perl Makefile.PL
– 执行后会出现一些提示让你选择,按照自己想法选或者一路回车都能通过
# make
– 这时他会下载一些运行时需要的东西
# make install
– 默认会把 check_memcached 文件放到 /usr/bin/check_memcached
– 没关系 把他拷贝到 nagios 的 libexec 下
#cp /usr/local/bin/check_memcached /usr/local/nagios/libexec/
#chown nagios.nagios check_memcached
在 commands.cfg 里面加上这么几条(这里我没有把 check_memcached 装在 memcached 服务器上,而是通过 Nagios 的 check_memcached 直接去访问 memcached 服务器的 11211 端口, 当然你也可以把他装在 memcached 服务器上利用 check_nrpe 来取他的值)
define command {
command_name check_memcached_11211
command_line $USER1$/check_memcached -H 192.168.1.101:11211 –size-warning 80 –size-critical 90
}
上面这个是来监控 memcached 的内存使用比例
define command {
command_name memcached_response_11211
command_line /usr/local/bin/check_memcached -H 192.168.1.101 -w 300 -c 500
}
这个是用来监控 memcached 是否还有应答
define command {
command_name check_memcached_hit
command_line /usr/local/bin/check_memcached -H 192.168.1.101 –hit-warning 10 –hit-critical 5
}
./check_memcached -H 192.168.108.96 -w 300 -c 500
更多详情见请继续阅读下一页的精彩内容:http://www.linuxidc.com/Linux/2015-01/112415p2.htm
六、报警配置
1、sendmail
首先要确保 sendmail 相关组件的完整安装,我们可以使用如下的命令来完成 sendmail 的安装:
# yum install -y sendmail*
然后重新启动 sendmail 服务:
# service sendmail restart
【问题】有可能遇到 sendmail 发送邮件慢的情况,这时候需要修改 hosts
vi /etc/hosts
#jd.com 设置是为了发件人的邮件设置为:nagios.jd.com server1 是主机名
127.0.0.1 server1 jd.com localhost server1
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
然后发送测试邮件,验证 sendmail 的可用性:
# echo “Hello World” | mail david.tang@bsmart.cn
PS:最好把 nagios.jd.com 添加到邮箱的白名单里,如果你用了免费的邮箱的话,有过滤的,你懂的。。。
2、修改配置
修改 /usr/local/nagios/etc/objects/contacts.cfg 联系人的邮箱
七、pnp4nagios 画图
好吧,我偷懒了、没有用 Cacti 来画图,直接使用了 nagios 的插件。。效果还可以啦
1、rrdtool 安装
#yum -y install cairo-devel libxml2-devel pango-devel pango libpng-devel freetype freetype-devel libart_lgpl-devel
#wget http://oss.oetiker.ch/rrdtool/pub/rrdtool-1.4.7.tar.gz
#tar -zxvf rrdtool-1.4.7.tar.gz
#./configure –prefix=/usr/local/rrdtool
#make && make install
#export PATH=”/usr/local/rrdtool/bin:$PATH” >>/etc/profile
2、安装 pnp4nagios-0.6.6.tar.gz
nagios $> tar zxvf pnp4nagios-0.6.6.tar.gz
nagios $> cd pnp4nagios-0.6.6
nagios $> ./configure –with-nagios-user=nagios –with-nagios-group=nagios
nagios $> make all
nagios $> make install
nagios $> make install-webconf
nagios $> make install-config
nagios $> make install-init
3、创建配置文件
nagios $> cd /usr/local/pnp4nagios/etc
nagios $> mv misccommands.cfg-sample misccommands.cfg
nagios $> mv nagios.cfg-sample nagios.cfg
nagios $> mv npcd.cfg-sample npcd.cfg
nagios $> mv process_perfdata.cfg-sample process_perfdata.cfg
nagios $> mv rra.cfg-sample rra.cfg
nagios $> cd pages
nagios $> mv web_traffic.cfg-sample web_traffic.cfg
nagios $> cd ../check_commands
nagios $> mv check_all_local_disks.cfg-sample check_all_local_disks.cfg
nagios $> mv check_nrpe.cfg-sample check_nrpe.cfg
nagios $> mv check_nwstat.cfg-sample check_nwstat.cfg
4、重启服务
nagios $> /etc/init.d/npcd restart
5、修改 nagios 的配置文件
nagios $> cd /etc/nagios/etc
nagios $> vim nagios.cfg
# 打开注视项:
process_performance_data=1
host_perfdata_command=process-host-perfdata
service_perfdata_command=process-service-perfdata
6、修改 commands.cfg
nagios $> cd /etc/nagioss/objects
nagios $> vim commands.cfg
## 添加
# ‘process-host-perfdata’ command definition
define command{
command_name process-host-perfdata
command_line /usr/local/pnp4nagios/libexec/process_perfdata.pl
}
# ‘process-service-perfdata’ command definition
define command{
command_name process-service-perfdata
command_line /usr/local/pnp4nagios/libexec/process_perfdata.pl
}
7、添加小太阳模版,镶嵌在 nagios 页面上
define host{
name host-pnp
action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=_HOST_
register 0
}
define service{
name srv-pnp
action_url /pnp4nagios/graph?host=$HOSTNAME$&srv=$SERVICEDESC$
register 0
}
8、修改 hosts.cfg 和 services.cfg
nagios $> cd /etc/nagios/objects
# 修改 hosts.cfg
nagios $> vim hosts.cfg
define host{
use linux-server,host-pnp
host_name eric.com
alias eric.com
address 192.168.1.100
}
# 修改 services.cfg
define service{
use local-service,srv-pnp
host_name eric.com
service_description PING
check_command check_ping!100.0,20%!500.0,60%
}
9、重启 nagios 服务
#vi /etc/httpd/conf.d/pnp4nagios.conf –> 修改以下内容
AuthUserFile /usr/local/nagios/etc/htpasswd.users
nagios $> service nagios restart
nagios $> service httpd restart
10、注意:
1、要正常出图,需要删除
/usr/local/pnp4nagios/share/install.php
/usr/local/pnp4nagios/var/perfdata/
2、重新配置出图,需删除
/usr/local/pnp4nagios/var/perfdata/ 下的 rrd 和 xml 文件
3、如果 Nagios 的配置文件是放在 etc 下的,需要修改一下配置,要不然认证会不通过
vi pnp4nagios.conf
AuthUserFile /etc/nagios/htpasswd.users
八、画图补充
1、原理说明
PNP 出图的数据来自脚本在系统打印出来的字符串,而 Nagios 源码提供的 check_procs 脚本系统输出字符串不符合数据格式规范
(没有包含性能数据),所以就无法出图了
–nagios 性能数据格式
例:cpu_user:OK-0% cpu_system:OK-0% cpu_idle:WARNING-99>70% | cpu_user=0%;120;90; cpu_system=0%;100;70; cpu_idle=99%;100;70;
其中性能数据为 | 后面的斜体部分,格式如下:
‘label’=value[UOM];[warn];[crit];[min];[max]
注意事项:
1. 空格分割标签 / 值对 例如 cpu_user=0%;100;90; cpu_system=0%;100;70; cpu_idle=99%;100;70;
2.label 可以包含任何字符
3. 单引号可省略,如果 label 中使用空格、等号和单引号,则需要需要单引号把 label 括起来。例如’a‘b’= c’=0%;100;90;
4. 标签可以为任意长度,但最好少于 19 个字符并且唯一,(RRD 有相关方面的限制),并且需要注意 NRPE 的返回值的限制(译者:好像是 4K 限制)
5. 两个单引号为指定的转义字符?
6.warn, crit, min or max 可以为空(比如,如果没有定义阀值,最大最小值则不适用)并且最后的分号可以省略
7. 如果 UOM 为 %,则不需要最大最小值
8.value, min and max 只能为负号“-”“0 到 9”和小数点“.”并且单位必须统一 例如:cpu_user=0.5%;99.9;-9;
9.warn and crit 必须在某个区间格式,参见 2.5 章。单位也必须统一
10.UOM 必须为以下其中之一
1. 如果未指定,默认为数字(整数和浮点数)(比如用户数,进程数,负载等)
2. s – 秒 (可以为纳秒 us 或毫秒 ms) cpu_user=0s;100;90; cpu_system=0us;100;70; cpu_idle=0ms;100;70;
3. % – 百分号 cpu_user=0%;100;90; cpu_system=0%;100;70; cpu_idle=99%;100;70;
4. B – 字节(可可以是 KB ,MB TB)cpu_user=0KB;100;90; cpu_system=0MB;100;70; cpu_idle=0B;100;70;
5. c – 一个计数器(比如网卡的流量)cpu_user=10c;100;90;
2、total_process 画图
修改 nagios-plugins-1.4.15\plugins\check_procs.c 文件
找到 main (int argc, char **argv) 函数,添加新的变量 pref:
char *perf;
perf = strdup(“”);
函数最后 return result; 之前的 printf (“\n”); 修改为:
asprintf(&perf, “%s”, perfdata (“processes”, procs, “”,
TRUE, wmax,
TRUE, cmax,
TRUE, 0,
FALSE, 0));
printf (“|%s\n”,perf);
重新编译源代码,将新生成的 check_procs 替换掉老的文件
3、增加 mysql 监控
(1) 下载
#yum install perl-Class-DBI-mysql
http://exchange.nagios.org/components/com_mtree/attachment.php?link_id=174&cf_id=30
http://exchange.nagios.org/components/com_mtree/attachment.php?link_id=174&cf_id=36
(2)
# cp check_mysqld.pl /usr/local/nagios/libexec
# chmod 755 /usr/local/nagios/libexec/check_mysqld.pl
# chown nagios.nagios /usr/local/nagios/libexec/check_mysqld.pl
# cp check_mysqld.php /usr/local/pnp4nagios/share/templates.dist
# chown nagios.nagios /usr/local/pnp4nagios/share/templates.dist/check_mysqld.php
# chmod 755 /usr/local/pnp4nagios/share/templates.dist/check_mysqld.php
(3)
# vi command.cfg
define command{
command_name check_mysqld
command_line $USER1$/check_mysqld.pl -H $HOSTADDRESS$ -u $ARG1$ -p $ARG2$ -D $ARG3$ -a uptime,threads_connected,questions,slow_queries,open_tables -w ‘,,,,’ -c ‘,,,,’ -A $USER21$
}
(4)
#vi resouce.cfg
$USER7$=nagios
$USER21$=’com_select,com_update,com_insert,com_insert_select,com_commit,com_delete,com_rollback,aborted_clients,aborted_connects,binlog_cache_disk_use,binlog_cache_use,bytes_received,bytes_sent,connections,created_tmp_disk_tables,created_tmp_files,created_tmp_tables,delayed_errors,delayed_insert_threads,delayed_writes,handler_update,handler_write,handler_delete,handler_read_first,handler_read_key,handler_read_next,handler_read_prev,handler_read_rnd,handler_read_rnd_next,key_blocks_not_flushed,key_blocks_unused,key_blocks_used,key_read_requests,key_reads,key_write_requests,key_writes,max_used_connections,not_flushed_delayed_rows,open_files,open_streams,open_tables,opened_tables,prepared_stmt_count,qcache_free_blocks,qcache_free_memory,qcache_hits,qcache_inserts,qcache_lowmem_prunes,qcache_not_cached,qcache_queries_in_cache,qcache_total_blocks,questions,select_full_join,select_rangle_check,slow_launch_threads,slow_queries,table_locks_immediate,table_locks_waited,threads_cached,threads_connected,threads_created,threads_running’
(5)
#vi mysql.cfg
define service{
use generic-service,srv-pnp
host_name mysql
service_description Mysqld_pnp
check_command check_mysqld!nagios!nagios!nagdb
}
这里贴一个自己使用的较完整的监控配置:
vi mysql.cfg
define host{
use linux-server,host-pnp
host_name mysql
alias My mysql Host
address 192.168.1.101
}
#define service{
# use generic-service
# host_name nginx
# service_description http
# check_command check_http!3!10
# }
#define service{
# use generic-service
# host_name nginx
# service_description PING
# check_command check_ping!100.0,20%!500.0,60%
# }
#define service{
# use generic-service
# host_name nginx
# service_description tomcat
# check_command check_tomcat!15!30
# }
define service{
use generic-service
host_name mysql
service_description Mysqld
check_command check_mysql!nagios!nagios!10!60
}
define service{
use generic-service,srv-pnp
host_name mysql
service_description Mysqld_pnp
check_command check_mysqld!nagios!nagios!nagdb
}
define service{
use generic-service,srv-pnp
host_name mysql
service_description CHECK USERS
check_command check_nrpe!check_users
}
# Create a service for monitoring the uptime of the server
# Change the host_name to match the name of the host you defined above
define service{
use generic-service,srv-pnp
host_name mysql
service_description Load
check_command check_nrpe!check_load
}
# Create a service for monitoring CPU load
# Change the host_name to match the name of the host you defined above
define service{
use generic-service,srv-pnp
host_name mysql
service_description SDA1
check_command check_nrpe!check_sd1
}
# Create a service for monitoring memory usage
# Change the host_name to match the name of the host you defined above
define service{
use generic-service,srv-pnp
host_name mysql
service_description SDA2
check_command check_nrpe!check_sd2
}
# Create a service for monitoring C:\ disk usage
# Change the host_name to match the name of the host you defined above
define service{
use generic-service,srv-pnp
host_name mysql
service_description Zombie
check_command check_nrpe!check_zombie_procs
}
# Create a service for monitoring the W3SVC service
# Change the host_name to match the name of the host you defined above
define service{
use generic-service,srv-pnp
host_name mysql
service_description total procs
check_command check_nrpe!check_total_procs
}
define service{
use generic-service,srv-pnp
host_name mysql
service_description Cpu
check_command check_nrpe!check_cpu
}
define service{
use generic-service,srv-pnp
host_name mysql
service_description Mem
check_command check_nrpe!check_mem
}
#define service{
# use generic-service
# host_name mysql
# service_description Http
# check_command check_http!/
# }
define service{
use generic-service,srv-pnp
host_name mysql
service_description Ping
check_command check_ping!100.0,20%!500.0,60%
}
define service{
use generic-service,srv-pnp
host_name mysql
service_description check_memcached_11211
check_command check_memcached_11211!80!100
}
define service{
use generic-service,srv-pnp
host_name mysql
service_description check_memcached_response_11211
check_command check_memcached_response_11211!300!500
}
define service{
use generic-service,srv-pnp
host_name mysql
service_description check_memcached_hit
check_command check_memcached_hit!10!5
}
呵呵,基本算是做完了。
Nagios 的详细介绍:请点这里
Nagios 的下载地址:请点这里
概述:公司的生产机器一共有 12 台,2 台 LVS(主备)、2 台 nginx、2 台 tomcat、1 台后台服务器 (nginx_tomcat)、3 台 mysql(主 + 备 + 异地灾备)、1 台图片服务器、2 台 memcached.
可以看出网站的架构就是基于高可用的原理的,每个层面都做了主备、系统的 PV 不高,对于并发布,高性能没有那么苛求,对于系统安全、稳定有较高要求,前期已经对系统做了各种日志分析,WAF 配置,漏洞扫面等等,现在还需要对系统进行监控,考虑再三还是决定使用 Nagios 来做。
PS:之前的同事用的 Zabbix,表示我这十几台机器真是伤不起。。。
照着网上的材料来做,有的地方实在是坑。。把自己整理出来的结果发出来,给大家做个参考
————————————– 分割线 ————————————–
在 Ubuntu 下配置 Mrtg 监控 Nginx 和服务器系统资源 http://www.linuxidc.com/Linux/2013-08/88417.htm
使用 snmp+Mrtg 监控 Linux 系统 http://www.linuxidc.com/Linux/2012-11/73561.htm
Mrtg 服务器搭建(监控网络流量)http://www.linuxidc.com/Linux/2012-07/64315.htm
网络监控器 Nagios 全攻略 http://www.linuxidc.com/Linux/2013-07/87067.htm
Nagios 搭建与配置详解 http://www.linuxidc.com/Linux/2013-05/84848.htm
Nginx 环境下构建 Nagios 监控平台 http://www.linuxidc.com/Linux/2011-07/38112.htm
在 RHEL5.3 上配置基本的 Nagios 系统(使用 Nagios-3.1.2) http://www.linuxidc.com/Linux/2011-07/38129.htm
CentOS 5.5+Nginx+Nagios 监控端和被控端安装配置指南 http://www.linuxidc.com/Linux/2011-09/44018.htm
Ubuntu 13.10 Server 安装 Nagios Core 网络监控运用 http://www.linuxidc.com/Linux/2013-11/93047.htm
————————————– 分割线 ————————————–
一、安装 Nagios
1、安装依赖包
#rpm -q gcc glibc glibc-common gd gd-devel xinetd openssl-devel
#yum install -y gcc glibc glibc-common gd gd-devel xinetd openssl-devel
#yum – y install httpd php mysql-devel php-mysql
2、添加用户和组
#groupadd nagcmd
#useradd -G nagcmd nagios
#passwd nagios
#usermod -a -G nagcmd apache
3、编译安装
#tar nagios-3.4.3.tar.gz
#cd nagios
#./configure –sysconfdir=/etc/nagios –with-command-group=nagcmd –enable-event-broker
#make all
#make install
#make install-init
#make install-commandmode
#make install-config
在 http 的配置文件目录【conf.d】中创建 nagios 的 web 程序配置文件
#make install-webconf
创建一个登陆 nagios web 程序的用户,用这个账号登陆 nagios(这是彻底的弱口令,配置完建议把密码修改掉)
#htpasswd -c /etc/nagios/htpasswd.users nagiosadmin
# 密码:nagios
以上配置过程需重新启动 httpd:
service httpd restart
报错信息:Could not reliably determine the server’s fully qualified
vi /etc/httpd/conf/httpd.conf
加入:ServerName localhost:80
4、安装 nagios-plugins
#tar zxvf nagios-plugins-1.4.13.tar.gz
#cd nagios-plugins-1.4.13
注意:组不使用 nagcmd
#./configure –with-nagios-user=nagios –with-nagios-group=nagios
#make all
#make install
5. 配置并启动 nagios
(1)加入开机启动 –
# chkconfig –add nagios<BR># chkconfig –level 35 nagios on<BR># chkconfig –list nagios
(2)检查其配置文件的语法是否正确
#/usr/local/nagios/bin/nagios -v /etc/nagios/nagios.cfg
(3)启动 nagios
#service nagios restart
(4)配置 selinux【会阻止 CGI 脚本】
1 <SPAN style=”FONT-SIZE: 14px”>#getenforce<BR>#setenfore 0<BR>#vi /etc/sysconfig/selinux ->SELINUX=disabled</SPAN>
二、Nagios 配置
这里只做简要说明,后续会贴出具体的配置
cgi.cfg
控制 CGI 访问的配置文件,如何新加了 cgi 配置文件,需要在这里增加
nagios.cfg
Nagios 主配置文件
resource.cfg
变量定义文件,又称为资源文件,在些文件中定义变量,以便由其他配置文件引用,如 $USER1$,好吧,其实就就是全局变量
objects
objects 是一个目录,在此目录下有很多配置文件模板,用于定义 Nagios 对象
objects/commands.cfg
命令定义配置文件,其中定义的命令可以被其他配置文件引用
objects/contacts.cfg
定义联系人和联系人组的配置文件
objects/localhost.cfg
定义监控本地主机的配置文件
objects/printer.cfg
定义监控打印机的一个配置文件模板,默认没有启用此文件
objects/switch.cfg
定义监控路由器的一个配置文件模板,默认没有启用此文件
objects/templates.cfg
定义主机和服务的一个模板配置文件,可以在其他配置文件中引用
objects/timeperiods.cfg
定义 Nagios 监控时间段的配置文件
objects/windows.cfg
监控 Windows 主机的一个配置文件模板,默认没有启用此文件
三、NRPE 安装【客户端】
说明:NRPE(nagios remore plugin execute)远程插件执行器,用于在远端服务区上运行监测命令的守护进程,它用于让 nagios 监控端基于安装的方式出发远端主机上的检测命令,并将检测结果输出至监控端。而其执行的开销远低于基于 ssh 的检测方式,而且检测过程中并不需要远程主机上的系统账号等信息。必须在客户端安装 nrpe 的 nagios 的 plugin
1、安装 plugin
#useradd -s /sbin/nologin nagios
#yum grouplist
#yum -y groupinstall “Development Tools” “Development Libraries”
#tar zxvf nagios-plugins-1.4.13.tar.gz
#cd nagios-plugins-1.4.13
#./configure –with-nagios-user=nagios –with-nagios-group=nagios
#make all
#make install
2、安装 nrpe
#tar zxvf nrpe-2.15.tar.gz
#cd nrpe-2.15
#./configure –with-nrpe-user=nagios -with-nrpe-group=nagios –with-nagios-user=nagios –with-nagios-group=nagios –enable-command-args –enable-ssl
#make all
#make install-plugin
#make install-daemon
#make install-daemon-config
3、配置 NRPE
# vi /usr/local/nagios/etc/nrpe.cfg
log_facility=daemon
pid_file=/var/run/nrpe.pid
server_port=5666
# 修改为本机的 IP
server_address=192.168.1.101
nrpe_user=nagios
nrpe_group=nagios
# 修改为 Nagios 服务端的 IP
allowed_hosts=192.168.1.100
command_timeout=60
4、启动 nrpe
# /usr/local/nagios/bin/nrpe -c /usr/local/nagios/etc/nrpe.cfg -d
为了方便 NRPE 的启动,可以将如下内容定义为 /etc/init.d/nrped 脚本
#!/bin/bash
NRPE=/usr/local/nagios/bin/nrpe
NRPECONF=/usr/local/nagios/etc/nrpe.cfg
case “$1” in
start)
echo -n “Staring NRPE daemon….”
$NRPE -c $NRPECONF -d
echo “done..”
;;
stop)
echo -n “Stopping NRPE daemon….”
pkill -u nagios nrpe
echo “done..”
;;
restart)
$0 stop
sleep 1
$0 start
;;
*)
echo “Usage: $0 start|stop|restart”
esac
exit 0
5、配置示例
vi /usr/local/nagios/etc/nrpe.cfg
command[check_users]=/usr/local/nagios/libexec/check_users -w 10 -c 20
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_sd1]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda6
command[check_sd2]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% -p /dev/sda3
command[check_zombie_procs]=/usr/local/nagios/libexec/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/local/nagios/libexec/check_procs -w 200 -c 400
command[check_cpu]=/usr/local/nagios/libexec/check_cpu.sh -w 50 -c 80
command[check_mem]=/usr/local/nagios/libexec/check_mem.sh -w 50 -c 80
command[check_swap]=/usr/local/nagios/libexec/check_swap -w 20% -c 10%
四、NRPE 服务端
1、安装 NRPE
#tar zxvf nrpe-2.15.tar.gz
#cd nrpe-2.15
#./configure –with-nrpe-user=nagios -with-nrpe-group=nagios –with-nagios-user=nagios –with-nagios-group=nagios –enable-command-args –enable-ssl –with-mysql
#make all
#make install-plugin
2、定义如何监控远程主机及服务
通过 NPRE 监控远程 Linux 主机要使用 check_nrpe 插件进行,其语法格式如下:
check_nrpe -H <host> [-n] [-u] [-p <port>] [-t <timeout>] [-c <command>] [-a <arglist…>]
示例:
define command
{
command_name check_swap_nrpe
command_line $USER1$check_nrpe -H “$HOSTADDRESS$” -c “check_swap”
}
如果还希望在监控远程 LINUX 主机时还能向其传递参数,则可以使用类似如下方式进行:
#cd /etc/nagios/objects/
#vi commands.cfg \\ 增加以下内容
define command{
command_name check_nrpe
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c $ARG1$
}
贴出一个新增加的配置:
define host{
use linux-server
host_name linhost
alias My Linux Host
address 192.168.1.101
}
define service{
use generic-service
host_name linhost
service_description CHECK USERS
check_command check_nrpe!check_users
}
define service{
use generic-service
host_name linhost
service_description Load
check_command check_nrpe!check_load
}
define service{
use generic-service
host_name linhost
service_description SDA1
check_command check_nrpe!check_sd1
}
define service{
use generic-service
host_name linhost
service_description SDA2
check_command check_nrpe!check_sd2
}
define service{
use generic-service
host_name linhost
service_description Zombie
check_command check_nrpe!check_zombie_procs
}
define service{
use generic-service
host_name linhost
service_description total procs
check_command check_nrpe!check_total_procs
}
五、增加监控脚本
比如 CPU、内存、LVS 等、需要自己写脚本来做、其实 so easy,只要注意 2 个点就 OK,控制输入(参数等)、格式化输出。只要输出格式符合 Nagios 的格式识别方式就行
1、CPU 监控
vi check_cpu.sh
#!/bin/sh
# Filename: check_cpu.sh
procinfo=`which procinfo 2>/dev/null`
sar=`which sar 2>/dev/null`
function help {
echo -e “\n\tThis plugin shows the % of used CPU, using either procinfo or sar (whichever is available)\n\n\t$0:\n\t\t-c <integer>\tIf the % of used CPU is above <integer>, returns CRITICAL state\n\t\t-w <integer>\tIf the % of used CPU is below CRITICAL and above <integer>, returns WARNING state\n”
exit -1
}
# Getting parameters:
while getopts “w:c:h” OPT; do
case $OPT in
“w”) warning=$OPTARG;;
“c”) critical=$OPTARG;;
“h”) help;;
esac
done
# Checking parameters:
([ “$warning” == “”] || [“$critical” == “”] ) && echo “ERROR: You must specify warning and critical levels” && help
[[“$warning” -ge “$critical”]] && echo “ERROR: critical level must be highter than warning level” && help
# Assuring that the needed tools exist:
(( [ -f $procinfo] && command=”procinfo”) || [-f $sar] ) || \
(echo “ERROR: You must have either procinfo or sar installer in order to run this plugin” && exit -1)
# Doing the actual check:
([ “$command” == “procinfo”] && idle=`$procinfo | grep idle | cut -d% -f1 | awk ‘{print $NF}’ | cut -d. -f1`) || \
idle=`$sar | tail -1 | awk ‘{print $8}’ | cut -d. -f1`
used=`expr 100 – $idle`
# Comparing the result and setting the correct level:
if [[$used -ge $critical]]; then
msg=”CRITICAL”
status=2
else if [[$used -ge $warning]]; then
msg=”WARNING”
status=1
else
msg=”OK”
status=0
fi
fi
# Printing the results:
echo “$msg – CPU used=$used% idle=$idle% | ‘CPU Usage’=$used%;$warning;$critical;”
# Bye!
exit $status
修改用户数组和加权限,以下操作都一样
#chown nagios.nagios check_cpu.sh
#chmod +x check_cpu.sh
#./check_cpu.sh -w 60 -c 80
【问题】由于使用 sar 命令监控系统资源使用,有可能存在系统没有安装 sar 的情况
解决方案:
#yum -y install sysstat
初次执行的时候会存在问题 需要建立一个存放记录的文件【当天日期】sar -o 16
在被监控端也需要配置【略】
【注意】需要加入 crontab 每天生成记录 cpu 命令的文件
#crontab -e 记得检查 crontab 任务是否启动
1 0 * * * /usr/lib64/sa/sa1
2、内存监控
vi check_mem.sh
#!/bin/bash
#DESC: OS mem check
#Author:James
function help {
echo -e “\n\tThis plugin shows the % of used MEM, using free (whichever is available)\n\n\t$0:\n\t\t-c <integer>\tIf the % of used MEM is above <integer>, returns CRITICAL state\n\t\t-w <integer>\tIf the % of used MEM is below CRITICAL and above <integer>, returns WARNING state\n”
exit -1
}
while getopts “w:c:h” OPT; do
case $OPT in
“w”) warning=$OPTARG;;
“c”) critical=$OPTARG;;
“h”) help;;
esac
done
set `free|head -2|tail -1`
MEMTOTAL=$2
MEMUSED=$3
MEMFREE=$4
MEMBUFFERS=$6
MEMCACHED=$7
REALMEMUSED=`echo $MEMUSED – $MEMBUFFERS – $MEMCACHED | bc`
USEPCT=`echo “scale=3; $REALMEMUSED / $MEMTOTAL * 100” |bc -l`
REALMEMUSEDmb=`echo “($REALMEMUSED)/1024” | bc`
MEMTOTALMB=`echo “($MEMTOTAL)/1024″|bc`
if [`echo “$USEPCT > $critical” |bc` == 1];then
echo “MEM CRITICAL – Memory usage = ${USEPCT}%,MEMTOTAL=${MEMTOTALMB}MB,RealUsed=${REALMEMUSEDmb}MB |Used=${USEPCT}%;$warning;$critical”
exit 2
elif [`echo “$USEPCT > $warning” |bc` == 1];then
echo “MEM WARNING – Memory usage = ${USEPCT}%,MEMTOTAL=${MEMTOTALMB}MB,RealUsed=${REALMEMUSEDmb}MB |Used=${USEPCT}%;$warning;$critical”
exit 1
elif [`echo “$USEPCT < $warning” |bc` == 1];then
echo “MEM OK – Memory usage = ${USEPCT}%,MEMTOTAL=${MEMTOTALMB}MB,RealUsed=${REALMEMUSEDmb}MB|Used=${USEPCT}%;$warning;$critical”
exit 0
else
echo “MEM ERROR – Unable to determine memory usage”
exit 3
fi
echo “Unable to determine memory usage.”
exit 3
3、LVS 监控
vi check_lvs.sh
1234567891011121314151617181920212223242526272829303132333435363738394041424344454647 #!/bin/bash
USAGE_Method=”$(basename $0)[-h|–hostname] <Free ip or hostname> [-w|–warning] <Free integer> [-c|–critical] <Free integer>”
USAGE_Value=”warning value must be small than critical value: `basename $0` $*”
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
if [$# -lt 4];then
echo “Usage:$USAGE_Method”
fi
while [$# -gt 0];
do
case “$1” in
-w|–warning)
shift
warning=$1
;;
-c|–critical)
shift
critical=$1
;;
esac
shift
done
if [[$warning == $critical || $warning -gt $critical]];then
#echo $warning
#echo $critical
echo “$USAGE_Value”
echo “Usage: $USAGE_Method”
exit 0
fi
ACT_COUNT=0
Inactive_count=0
stat1=`sudo ipvsadm | grep http | grep Route|wc -l`
if [$stat1 -ne 0];then
for NUM in `sudo ipvsadm | grep http | grep Route | awk ‘{print $5}’`
do
ACT_COUNT=$(($ACT_COUNT+ $NUM))
done
for NUM in `sudo ipvsadm | grep http | grep Route | awk ‘{print $6}’`
do
Inactive_count=$(($Inactive_count+ $NUM))
done
else
echo ” stat1:$stat1, lvs critical,lvs is down now.”
exit 3
fi
4、MYSQL 监控
在需要监控的 mysql 数据库上建一个专门给 Nagios 使用的库
mysql>create database nagdb default CHARSET=utf8;
mysql> grant select on nagdb.* to ‘nagios’@’192.168.1.100’;
mysql> update mysql.user set Password = PASSWORD(‘nagios’) where user=’nagios’;
#/usr/local/nagios/libexec/check_mysql -H 192.168.1.101 -u nagios -d nagdb -p nagios -w 10 -c 30
5、memcached 监控
使用插件,用 perl 语言写的,需要安装多个依赖包,比较坑爹。。我也不容易啊
(1)安装模块
#yum -y install perl-Carp-Clan perl-Cache-Memcached perl-Nagios-Plugin
– 如果不能安装
#wget http://dag.wieers.com/rpm/packages/rpmforge-release/rpmforge-release-0.5.2-2.rf.src.rpm
#rpm -ivh rpmforge-release-0.5.2-2.rf.src.rpm
#yum -y install perl-Nagios-Plugin.noarch perl-Carp-Clan.noarch perl-Cache-Memcached.noarch
– 如果 perl-Nagios-Plugin 无法安装
wget http://packages.sw.be/perl-Nagios-Plugin/perl-Nagios-Plugin-0.33-1.el5.rf.noarch.rpm
rpm -ivh perl-Nagios-Plugin-0.33-1.el5.rf.noarch.rpm –force –nodeps
(2)插件安装
下载 Nagios-Plugins-Memcached-0.02.tar.gz 后安装【依赖包较多,请注意查看.pm 文件的存放位置】
#tar xzvf Nagios-Plugins-Memcached-0.02.tar.gz
#cd Nagios-Plugins-Memcached-0.02
#yum -y install perl-CPAN
# perl Makefile.PL
– 执行后会出现一些提示让你选择,按照自己想法选或者一路回车都能通过
# make
– 这时他会下载一些运行时需要的东西
# make install
– 默认会把 check_memcached 文件放到 /usr/bin/check_memcached
– 没关系 把他拷贝到 nagios 的 libexec 下
#cp /usr/local/bin/check_memcached /usr/local/nagios/libexec/
#chown nagios.nagios check_memcached
在 commands.cfg 里面加上这么几条(这里我没有把 check_memcached 装在 memcached 服务器上,而是通过 Nagios 的 check_memcached 直接去访问 memcached 服务器的 11211 端口, 当然你也可以把他装在 memcached 服务器上利用 check_nrpe 来取他的值)
define command {
command_name check_memcached_11211
command_line $USER1$/check_memcached -H 192.168.1.101:11211 –size-warning 80 –size-critical 90
}
上面这个是来监控 memcached 的内存使用比例
define command {
command_name memcached_response_11211
command_line /usr/local/bin/check_memcached -H 192.168.1.101 -w 300 -c 500
}
这个是用来监控 memcached 是否还有应答
define command {
command_name check_memcached_hit
command_line /usr/local/bin/check_memcached -H 192.168.1.101 –hit-warning 10 –hit-critical 5
}
./check_memcached -H 192.168.108.96 -w 300 -c 500
更多详情见请继续阅读下一页的精彩内容:http://www.linuxidc.com/Linux/2015-01/112415p2.htm