Applied Users Forums

Hardware & Infrastructure => Open Source => Topic started by: Mark on September 12, 2011, 01:25:47 PM

Title: Nagios - Host Down
Post by: Mark on September 12, 2011, 01:25:47 PM
This is really bugging me (and hopefully this is the right board for it).

I have a server with a handful of IP addresses setup in Nagios with each IP address as an individual "host".  I've had one IP address that I wasn't using for a while so I disabled the service and host checks on it.  Now I am using it again (~45 days later), have service checks enabled, all checks enabled on the host, all other IP addresses on this server are reported as up, pinging this IP gives the same results as pinging all of the other IP addresses....

Nagios still reports this host as being down.

Manual ping results:
root@Nagios:/usr/local/nagios# ping 1.2.3.4 -c 5
PING 1.2.3.4 (1.2.3.4) 56(84) bytes of data.
From 10.x.x.x: icmp_seq=1 Redirect Network(New nexthop: 10.x.x.y)
64 bytes from 1.2.3.4: icmp_seq=1 ttl=56 time=62.3 ms
64 bytes from 1.2.3.4: icmp_seq=2 ttl=56 time=60.4 ms
64 bytes from 1.2.3.4: icmp_seq=3 ttl=56 time=79.0 ms
64 bytes from 1.2.3.4: icmp_seq=4 ttl=56 time=60.4 ms
64 bytes from 1.2.3.4: icmp_seq=5 ttl=56 time=60.3 ms

--- 1.2.3.4 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4005ms
rtt min/avg/max/mdev = 60.397/64.528/79.034/7.291 ms
root@Nagios:/usr/local/nagios#


Same results if I ping a different IP on that same host.
Title: Re: Nagios - Host Down
Post by: Bloody Jack Kidd on September 12, 2011, 01:54:00 PM
the problem we sometimes have is a bit different and it has to do with the OS getting an ICMP redirect when a router goes down, then not updating the route table once the route is again available.  Hence a blocking outage remains active in Nagios even though it's not longer the case.
Title: Re: Nagios - Host Down
Post by: Mark on September 12, 2011, 02:15:43 PM
Ok, so I should check the routing table on the nagios box, or I how do you usually recover from this?
Title: Re: Nagios - Host Down
Post by: Bloody Jack Kidd on September 12, 2011, 03:42:27 PM
yeah, perhaps like a netstat -r or a route query to see if that's poisoned (but I think that's my issue, not your per se, since you can ping the device from the Nagios box)

my first check might be

# /usr/local/etc/rc.d/nagios reload

Title: Re: Nagios - Host Down
Post by: Mark on September 12, 2011, 04:02:40 PM
Also, running check_ping manually with the same exact syntax as the commands.cfg uses returns:

./check_ping -H 1.2.3.4  -w 3000.0,80% -c 5000.0,100% -p 5
PING OK - Packet loss = 0%, RTA = 60.45 ms


I don't have any route issues as far as I can tell either.

Very odd.  I changed the host's IP, reloaded, and now the service check should fail because the port in question is not open on that IP and the host should show up. .. but service still shows up and host still shows down.

Removing and re-adding is next.
Title: Re: Nagios - Host Down
Post by: Jeff Golas on September 13, 2011, 11:00:15 AM
My question is - is it saying its down because of a ping or is it doing a different check? If I remember correctly don't you tell it what type of device it is in the config file? Maybe you have it checking via SNMP?

Jeff
Title: Re: Nagios - Host Down
Post by: Mark on September 13, 2011, 11:03:19 AM
Quote from: Jeff Golas on September 13, 2011, 11:00:15 AM
My question is - is it saying its down because of a ping or is it doing a different check? If I remember correctly don't you tell it what type of device it is in the config file? Maybe you have it checking via SNMP?

Jeff

Nah, its using check_ping.
Title: Re: Nagios - Host Down
Post by: Mark on September 13, 2011, 08:05:20 PM
GAH!  USER ERROR!

I was just deleting some of the notifications and realized that I had the IP wrong in the host definition, but I am using it manually in the service check and typed it right in there.  It is x.22.26.x and I had 26.22!  You'd think after seeing hundreds of these notifications I would have realized this already!

What an idiot.   :-\
Title: Re: Nagios - Host Down
Post by: Bloody Jack Kidd on September 13, 2011, 08:16:26 PM
it can happen... (not to me)... but it can happen

;)