EMS Refresh Status


#1

It would be a good idea to report the status of the most recent attempt to perform an EMS Refresh for each provider. In addition, if an error occurred, we should include instructions for how to enable various debug logs for the provider for getting more details.

This information could hopefully be displayed on the Provider’s summary page. As well as alerting the user in the quad icon (e.g., exclamation point).

The impetus behind this is that we are seeing several situations where OpenStack installations may present problems during EMS Refresh, and the only way to really know that there’s a problem is when the user notices that no data has been collected and checks the log.

This type of reporting is a way to give the user a quicker idea that there is a problem, and a better way to help diagnose the problem.


#2

@blomquisg this is a great idea, and I’d like to add that it would also be very helpful to raise an operational event when such EMS Refresh fails so that event can be handled by the alerting mechanism (similar to what is done when the appliance processes themselves raise event - ex: DB 80% filled up, Master appliance down,…. - now add something like EMS_REFREH_ERROR event.


#3

In addition, I think we can probably reuse the EMS quad icon that we use for password validation. I think we can probably turn that more into a general status indicator. If a refresh fails, that status goes bad, with a tooltip explaining why. Obviously if the password is bad / changed out from under us, then the refresh would fail anyway, so overloading the icon seems like a good fit in that case. If the icon is good, tooltip could show the last refresh time (e.g. Last Refreshed 5 minutes ago). In grid mode we would show the text directly. In this way, we probably don’t need a task, and if it’s made a virtual column it could be made reportable.

The only thing I can think of that would make this tricky is the partial refreshes. It will be more tricky if we move to skeletal refreshes.


#4

@Fryguy, I like the idea and I think from an administrator of the system point of view, I would want to know when the refresh has failed be it for a partial/skeletal (I am less worried about the success which is anticipated but it could be good to get access to that info anyway). The quad icon could maybe show a different indicator if it’s full Vs targeted, and the raised operational event could carry the specific data elements allowing to identify where it went wrong in the refresh.


#5

I think for first pass, we would ignore the targeted refresh status.

Jason had a suggestion that we could initiate a full refresh if a targeted refresh fails. That way, if the full refresh also fails, we’d record the error for the full refresh and be able to alert the user accordingly.

If at some point we decide we need to show information about the targeted refresh, we could figure that out later.