Bunny::Session error - Got an exception when receiving data


#1

Hello everyone,

I have fresh ManageIQ deployment (latest stable version) connected to Openstack provider and listening to events with AMQP. Every few hours I see the following error in evm.log:

E, [2017-10-17T04:47:43.342658 #12715] ERROR – #<Bunny::Session:0x1f53010 guest@10.19.233.1:5672, vhost=/, hosts=[10.19.233.1]>: Got an exception when receiving data: Connection reset by peer (Errno::ECONNRESET)
W, [2017-10-17T04:47:43.342841 #12715] WARN – #<Bunny::Session:0x1f53010 guest@10.19.233.1:5672, vhost=/, hosts=[10.19.233.1]>: Recovering from a network failure…
W, [2017-10-17T04:47:53.343421 #12715] WARN – #<Bunny::Session:0x1f53010 guest@10.19.233.1:5672, vhost=/, hosts=[10.19.233.1]>: Retrying connection on next host in line: 10.19.233.1:5672

RabbitMQ logs shows the following error, however it repeats no matter if ManageIQ is connected:

=ERROR REPORT==== 17-Oct-2017::09:01:37 ===
closing AMQP connection <0.10231.131> (172.17.101.5:43508 -> 172.17.101.4:5672):
missed heartbeats from client, timeout: 60s

Running “Refresh Relationships and Power States” on the provider doesn’t reproduce the issue.

First of all, how critical is the error? Does it mean ManageIQ is not listening to Openstack events, or there are few missed events on every connection drop or its just annoying hiccup?
Second, how would you recommend investigating this issue further? Enabling debug level for fog doesn’t seem to report anything about events.

Thanks,
Alex


#2

Hi! What OpenStack distribution are you using? I know that some don’t support using AMQP to read events, as the required port is restricted. We recommend using the Ceilometer option.

Mainn


#3

We have RedHat OpenStack 10. Cielometer was our first choice but we had other issues with it that lead us to use AMQP. Is there any way to do health check for AMQP connection from ManageIQ side? Are there any logs entries I can look for?

Thanks,


#4

Hi, events are read much more often than every few hours. If there are no these errors on minute bases, I’d consider it as OK. If event monitor fails it gets restarted - it’s implemented in that way to handle some kinds of errors.

It would be good to check if there are events in MIQ - check Timelines for the OpenStack provider which reports these log entries.


#5

The Timeline for OpenStack seems to be working fine and catching the events. Anything else would you suggest to try or check?