共计 4105 个字符,预计需要花费 11 分钟才能阅读完成。
Note: There is a file embedded within this post, please visit this post to download the file.
机房没有温度报警装置,我用此方法实现对机房温度的掌控,如果只有一台报警,则可认为单机故障,如果几台同时报警,则可认为机房空调出现了问题。
具体实现方法如下:
环境:被监控机:CentOS 6.4
1、安装硬件传感器监控软件 sensors
#yum install lm_sensors*
2、运行 sensors-detect 进行传感器检测
#sensors-detect ## 一路回车即可,此步我在虚拟机下报错,但在物理机上没有问题
3、运行 sensors 看是否能读取数据,如下像下面这样表示正常:
[root@rd02 ~]# sensors
coretemp-isa-0000
Adapter: ISA adapter
Core 0: +32.0°C (high = +76.0°C, crit = +100.0°C)
Core 1: +32.0°C (high = +76.0°C, crit = +100.0°C)
4、#vi /usr/local/nagios/libexec/check_cputemp ## 粘贴如下 #号之间的内容
##########################################################
#!/bin/sh
#########check_cputemp###########
#date : May 2011
#Licence GPLv2
#INSTALLATION
#the script need to install lm_sensors
#sensors’s output need like below format
#########################################
#coretemp-isa-0000#
#Adapter: ISA adapter#
#Core 0: +27°C (high = +85°C)#
#
#coretemp-isa-0001#
#Adapter: ISA adapter#
#Core 1: +25°C (high = +85°C) #
#########################################
#you can use NRPE to define service in nagios
#check_nrpe!check_cputemp.sh
# Plugin return statements
STATE_OK=0
STATE_WARNING=1
STATE_CRITICAL=2
STATE_UNKNOWN=3
print_help_msg(){
$Echo“Usage: $0 -h to get help.”
}
print_full_help_msg(){
$Echo“Usage:”
$Echo“$0 [-v] -m sensors -w cpuT -c cpuT”
$Echo“Sepicify the method to use the temperature data sensors.”
$Echo“And the corresponding Critical value must greater than Warning value.”
$Echo“Example:”
$Echo“${0} -m sensors -w 40 -c 50″
}
print_err_msg(){
$Echo“Error.”
print_full_help_msg
}
to_debug(){
if [“$Debug” = “true”]; then
$Echo“$*”>> /var/log/check_sys_temperature.log.$$ 2>&1
fi
}
unset LANG
Echo=”echo -e”
if [$# -lt 1]; then
print_help_msg
exit 3
else
while getopts :vhm:w:c: OPTION
do
case $OPTION
in
v)
#$Echo“Verbose mode.”
Debug=true
;;
m)
method=$OPTARG
;;
w)
WARNING=$OPTARG
;;
c)
CRITICAL=$OPTARG ;;
h)
print_full_help_msg
exit 3
;;
?)
$Echo“Error: Illegal Option.”
print_help_msg
exit 3
;;
esac
done
if [“$method” = “sensors”]; then
use_sensors=”true”
to_debug use_sensors
else
$Echo“Error. Must to sepcify the method to use sensors.”
print_full_help_msg
exit 3
fi
to_debug All Values are \”Warning:“$WARNING”and Critical:“$CRITICAL”\”.
fi
#########lm_sensors##################
if [“$use_sensors” = “true”]; then
sensorsCheckOut=`which sensors 2>&1`
if [$? -ne 0];then
echo $sensorsCheckOut
echo Maybe you need to check your sensors.
exit 3
fi
to_debug Use $sensorsCheckOut to check system temperature
TEMP1=`sensors | head -3 | tail -1 | gawk‘{print $3}’| grep -o [0-9][0-9]`
TEMP2=`sensors | head -4 | tail -1 | gawk‘{print $3}’| grep -o [0-9][0-9]`
SUM=$(($TEMP1 + $TEMP2))
TEMP=$(($SUM/2))
if [-z “$TEMP”] ; then
$Echo“No Data been get here. Please confirm your ARGS and re-check it with Verbose mode, then to check the log.”
exit 3
fi
to_debug temperature data is $TEMP
else
$Echo“Error. Must to sepcify the method to use sensors”
print_full_help_msg
exit 3
fi
######### Comparaison with the warnings and criticals thresholds given by user############
CPU_TEMP=$TEMP
#if [“$WARNING” != “0”] || [“$CRITICAL” != “0”]; then
if [“$CPU_TEMP” -gt “$CRITICAL”] && [“$CRITICAL” != “0”]; then
STATE=”$STATE_CRITICAL”
STATE_MESSAGE=”CRITICAL”
to_debug $STATE , Message is $STATE_MESSAGE
elif [“$CPU_TEMP” -gt “$WARNING”] && [“$WARNING” != “0”]; then
STATE=”$STATE_WARNING”
STATE_MESSAGE=”WARNING”
to_debug $STATE , Message is $STATE_MESSAGE
else
STATE=”$STATE_OK”
STATE_MESSAGE=”OK”
to_debug $STATE , Message is $STATE_MESSAGE
fi
echo“The TEMPERATURE“$STATE_MESSAGE”“-”The CPU’s Temperature is“$CPU_TEMP”℃ !”
exit $STATE
##########################################################
5、赋予上述脚本执行权限:
#chmod +x /usr/local/nagios/libexec/check_cputemp
6、配置 nrpe.cfg,添加如下一行:
command[check_cputemp]=/usr/local/nagios/libexec/check_cputemp -m sensors -w 38 -c 45
注意:以上六步均在被监控机上完成。
7、在 Nagios 服务器配置服务:
define service{
use generic-service
host_name <hostname>
service_description CPU Temperature
check_command check_nrpe!check_cputemp
}
保存后重启 nagios 服务即可。
Nagios 的详细介绍 :请点这里
Nagios 的下载地址 :请点这里
相关阅读 :
网络监控器 Nagios 全攻略 http://www.linuxidc.com/Linux/2013-07/87067.htm
Nagios 搭建与配置详解 http://www.linuxidc.com/Linux/2013-05/84848.htm
Nginx 环境下构建 Nagios 监控平台 http://www.linuxidc.com/Linux/2011-07/38112.htm
在 RHEL5.3 上配置基本的 Nagios 系统 (使用 Nagios-3.1.2) http://www.linuxidc.com/Linux/2011-07/38129.htm
CentOS 5.5+Nginx+Nagios 监控端和被控端安装配置指南 http://www.linuxidc.com/Linux/2011-09/44018.htm
Ubuntu 13.10 Server 安装 Nagios Core 网络监控运用 http://www.linuxidc.com/Linux/2013-11/93047.htm
更多 CentOS 相关信息见 CentOS 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=14