nagios监控heartbeat配置教程_集群与高可用

nagios监控heartbeat配置教程: 发布时间：2019-12-08编辑：脚本学堂

使用nagios监控heartbeat的配置方法，heartbeat常用命令，nagios服务端与客户端的配置实例，不了解的朋友参考下。

先来看几个命令，在heartbeat安装后会自动加上，在监控脚本中会用到以下命令：

复制代码代码示例:

[root@usvr-210 libexec]# which cl_status  

/usr/bin/cl_status  

[root@usvr-210 libexec]# cl_status listnodes   #列出当前heartbeat集群中的节点  

192.168.3.1  

usvr-211  

usvr-210  

[root@usvr-210 libexec]# cl_status nodestatus usvr-211  #列出节点的状态  

active  

[root@usvr-210 libexec]# cl_status nodestatus 192.168.3.1  #列出节点的状态  

ping

check_heartbeat.sh原理就是列出集群中所有节点，并监测所有节点的状态是否正常，实验的节点状态为ping和active。
当active+ping的个数为0时critical
当active+ping的个数小于节点总个数时为warn
当active+ping的个数等于节点总个数时为ok

操作实例：

复制代码代码示例:

[root@usvr-210 libexec]# cat check_heartbeat.sh
#!/bin/bash
# Author: Emmanuel Bretelle
# Date: 12/03/2010
# Description: Retrieve linux HA cluster status using cl_status
# Based on http://www.randombugs.com/linux/howto-monitor-linux-heartbeat-snmp.html
#
# Autor: Stanila Constantin Adrian
# Date: 20/03/2009
# Description: Check the number of active heartbeats
# http://www.randombugs.com

# Get program path
REVISION=1.3
PROGNAME=`/bin/basename $0`
PROGPATH=`echo $0 | /bin/sed -e 's,[/][^/][^/]*$,,'`

NODE_NAME=`uname -n`
CL_ST='/usr/bin/cl_status'

#nagios error codes
#. $PROGPATH/utils.sh
OK=0
WARNING=1
CRITICAL=2
UNKNOWN=3

usage () {
    echo "
Nagios plugin to heartbeat.

Usage:
$PROGNAME
$PROGNAME [--help | -h]
$PROGNAME [--version | -v]

Options:
--help -l Print this help information
--version -v Print version of plugin
"
}

help () {
    print_revision $PROGNAME $REVISION
    echo; usage; echo; support
}

while test -n "$1"
do
case "$1" in
    --help | -h)
      help
      exit $STATE_OK;;
    --version | -v)
      print_revision $PROGNAME $REVISION
      exit $STATE_OK;;
#    -H)
#      shift
#      HOST=$1;;
#    -C)
#      shift
#      COMMUNITY=$1;;
    *)
      echo "Heartbeat UNKNOWN: Wrong command usage"; exit $UNKNOWN;;
esac
shift
done

$CL_ST hbstatus > /dev/null
res=$?
if [ $res -ne 0 ]
then
echo "Heartbeat CRITICAL: Heartbeat is not running on this node"
exit $CRITICAL
fi

declare -i I=0
declare -i A=0
NODES=`$CL_ST listnodes`

for node in $NODES
do
status=`$CL_ST nodestatus $node`
let I=$I+1
# if [ $status == "active" ] 默认情况下检测active状态的个数，但是ping状态也为正常状态，因此改成如下条件。
if [ $status == "active" -o $status == "ping" ]
then
    let A=$A+1
fi
done

if [ $A -eq 0 ]
then
echo "Heartbeat CRITICAL: $A/$I"
exit $CRITICAL
elif [ $A -ne $I ]
then
echo "Heartbeat WARNING: $A/$I"
exit $WARNING
else
echo "Heartbeat OK: $A/$I"
exit $OK
fi

在nagios客户端，lvs集群usvr-210，usvr-211，通过nagios服务器端的check_nrpe来获取监控信息。

一、naigos客户端
1，先将脚本复制到nagios命令目录下并修改相应权限

cp check_heartbeat.sh /usr/local/nagios/libexec/
chmod a+x check_heartbeat.sh
chown nagios.nagios check_heartbeat.sh

2，在naigos客户端的配置文件中加入监控命令。
vim /usr/local/nagios/etc/nrpe.cfg

command[check_heartbeat]=/usr/local/nagios/libexec/check_heartbeat.sh

3，重新载入配置文件。

service xinetd reload

二、nagios服务端
1，加入相关监控服务

define service {
    use                     local-service
    service_description     heartbeat-lvs-master
    check_command           check_nrpe!check_heartbeat
    service_groups          heartbeat_services
    host_name               usvr-210
    check_interval          5
    notifications_enabled   1
    notification_interval   30
    contact_groups          admins
}
define service {
    use                     local-service
    service_description     heartbeat-lvs-slave
    check_command           check_nrpe!check_heartbeat
    service_groups          heartbeat_services
    host_name               usvr-211
    check_interval          5
    notifications_enabled   1
    notification_interval   30
    contact_groups          admins
}

2，检查并载入配置文件

nagioscheck
service nagios reload

监控如下：

heartbeat监控完成。

参考链接：
http://wiki.debuntu.org/wiki/Linux_HA_Heartbeat/Monitoring_with_Nagios

上一篇：heartbeat启动后无反应怎么办
下一篇：haproxy负载均衡与动静分离教程详解

与 nagios监控heartbeat配置教程有关的文章

本文标题：nagios监控heartbeat配置教程
本页链接：http://www.jb200.com/article/30096.html

浏览排行

栏目分类

热点文章

nagios监控heartbeat配置教程