任务需求:统计每天的故障情况,计算服务稳定指标;根据监控记录的log,来进行统计。
日志格式大概如下所示:
corp_resin_a:10.11.15.35:6805 is down! 2008-08-02-20:55:16
mx7.cmail.sogou.com:10.11.15.34:6805 is down! 2008-08-02-20:55:26
mx5.cmail.sogou.com:10.11.15.46:6805 is down! 2008-08-02-20:55:26
mx10.cmail.sogou.com:192.168.95.143:6805 is down! 2008-08-02-20:55:26
corp_resin_d:192.168.132.189:6805 is down! 2008-08-02-20:55:26
corp_resin_d:192.168.132.89:6805 is down! 2008-08-02-20:55:26
corp_resin_a:10.11.15.47:6805 is down! 2008-08-02-20:55:26
corp_resin_d:192.168.131.164:6805 is down! 2008-08-02-20:55:26
corp_resin_a:10.11.15.20:6805 is down! 2008-08-02-20:55:26
mx10.cmail.sogou.com:192.168.95.143:6805 is down! 2008-08-02-20:55:37
监控脚本每分钟运行一次,因此可以认为出现一次log就算一分钟故障时间
采用perl来写,没啥别的目的,就是练手,日志格式为主机名,ip,端口 日期,时间
#!/usr/bin/perl
my $web=0;
my $pop3=0;
my $smtp=0;
my $master=0;
my $slave=0;
my $resin=0;
my($curlogfile) =@ARGV;
open(FILE,$curlogfile);
while(<FILE>){
if ($_=~/down/){
$totaltimes++;
}
chomp();
@items=split(/ /);
my $service=$items[0];
my $date=$items[3];
@newitem1=split(/:/,$service);
$ip=$newitem1[1];
# print $ip."n";
# sleep (5);
my $port=$newitem1[2];
@newitem2=split(/-/,$date);
my $time=$newitem2[3];
if ($port eq "2000"){
$master=$master+1;
}
if ($port eq "9002"){
$slave=$slave+1;
}
if ($port eq "80"){
$web=$web+1;
}
if ($port eq "25"){
$smtp=$smtp+1;
}
if ($port eq "110"){
$pop3=$pop3+1;
}
if ($port=~/6802|6803|6804|6805/){
$resin=$resin+1;
}
if ( defined( $totalip{$ip} ) ){
$totalip{$ip}=$totalip{$ip}+1;
}else{
$totalip{$ip}=1;
}
# print $ip." ".$port." ".$port{$ip}."n";
}
close(FILE);
print "总故障次数:".$totaltimes."n";
if ($web gt 0){
print "WEB故障次数:".$web."n";
}
if ($pop3 gt 0){
print "POP3故障次数:".$pop3."n";
}
if ($smtp gt 0){
print "SMTP故障次数:".$smtp."n";
}
if ($resin gt 0){
print "RESIN故障次数:".$resin."n";
}
if ($master gt 0){
print "MASTER故障次数:".$master."n";
}
if ($slave gt 0){
print "SLAVE故障次数:".$slave."n";
}
print "故障ip:"." "."故障次数n";
foreach $key (sort keys %totalip) {
$num = $totalip{$key};
print $key." ".$num."n";
}
统计结果:
coolerfeng@mail:~/log$ ./log.pl net-2008-08-02.txt
总故障次数:642
WEB故障次数:1
POP3故障次数:1
RESIN故障次数:39
MASTER故障次数:601
故障ip: 故障次数
10.10.71.50 1
10.10.71.92 2
10.11.15.20 2
10.11.15.34 2
10.11.15.35 2
10.11.15.46 2
10.11.15.47 2
192.168.131.164 9
192.168.131.76 1
192.168.132.189 8
192.168.132.89 7
192.168.41.194 601
192.168.95.143 3
看来这台master机器坏的时间太长了,严重影响了稳定性。