Provider Connection Monitoring / Debugging


#1

The ManageIQ UI should provide some information about the connectivity to Providers (ext_management_systems). Both while creating new Providers as well as the ongoing connection to existing Providers.

Unfortunately, I don’t have a great idea on how to present this type of information, nor what exactly we’d be looking for.

However, this largely comes from an issue in the IRC channel about a community user attempting to connect to an OpenStack instance and then never seeing any objects come back. After he enabled debug logging, he saw “ERROR – : excon.error #<Excon::Errors::Timeout: connect timeout reached>” in the logs. To anyone who knows that we use Fog for OpenStack connections, it might be obvious that this is an Openstack related error. General users will definitely not know. And, even with that knowledge, that specific error is very hard to diagnose.

Looking for feedback on how this might be presented, how we would translate cryptic errors, and possibly how ManageIQ could dig deeper into errors that occur to try to provide diagnostic suggestions.


#2

First suggestion: do a better job at tracing the entire series of calls with Openstack services. For instance, we may indicate in the various logs (evm and fog) something like: ERROR -- : excon.error #<Excon::Errors::Timeout: connect timeout reached>, but that doesn’t tell us which service timed out. It could be keystone, nova, glance, or any other service.

I’m not exactly sure how to better surface that information other than seeing if there’s better details we can get from Fog. Regardless, it should be logged better. Then, it should be surfaced to the user somewhere (a provider connection audit maybe?).


#3

We might be able to improve the logging in https://github.com/ManageIQ/manageiq/blob/master/vmdb/lib/vmdb/logging/fog_logger.rb . Even just saying which service or a little detail in that respect might help.