Azure Refresh Issues

fine
providers

#1

Hello all,

I’ve just deployed a new fine-2 appliance on an Azure VM and created two providers, one for East US and one for South Central US, in the same subscription. The provider for East US refreshes correctly, but South Central fails with:

[----] E, [2017-06-18T15:21:11.553687 #46816:43b14c] ERROR -- : MIQ(ManageIQ::Providers::Azure::CloudManager::Refresher#refresh) EMS: [Azure-southcentralus], id: [40792000000000007] Refresh failed
[----] E, [2017-06-18T15:21:11.553932 #46816:43b14c] ERROR -- : [NameError]: uninitialized constant ManageIQ::Providers::Azure::Armrest  Method:[rescue in block in refresh]
[----] E, [2017-06-18T15:21:11.580835 #46816:43b14c] ERROR -- : MIQ(MiqQueue#deliver) Message id: [40792000000454626], Error: [uninitialized constant ManageIQ::Providers::Azure::Armrest]
[----] E, [2017-06-18T15:21:11.580967 #46816:43b14c] ERROR -- : [EmsRefresh::Refreshers::EmsRefresherMixin::PartialRefreshError]: uninitialized constant ManageIQ::Providers::Azure::Armrest  Method:[rescue in deliver]
[----] E, [2017-06-18T15:21:11.581073 #46816:43b14c] ERROR -- : /var/www/miq/vmdb/app/models/ems_refresh/refreshers/ems_refresher_mixin.rb:50:in `refresh'

This is a new appliance, and that’s about all I could find in evm.log; any help would be appreciated.


#2

I believe this was a scoping issue that has since been fixed. If you’re willing to apply a hotfix, look for the cloud_manager/refresh_parser.rb file, and update any instance of rescue Azure::Armrest::ApiException with rescue ::Azure::Armrest::ApiException (i.e. should have leading double colons).

If that’s not it, then I’ll need to see the full backtrace to help you out.


#3

Thanks, that seems to have fixed that error. I don’t have any info showing up for the south central provider yet besides security groups, but there’s a fair amount of stuff out there so it may just be taking a while.


#4

The fact that you reached there means that an error of some sort did occur, it just wasn’t handled properly. So, definitely keep an eye on the logs for the “real” error to see what the issue is.


#5

I haven’t run into the actual error yet, but the region has yet to refresh, refresh status shows as never. Tailing azure.log I see it querying a very large amount of stuff in our storage accounts, and tailing evm.log the only thing I see related to the south central provider is:

[----] I, [2017-06-21T08:24:04.254542 #61941:6f5130] INFO -- : MIQ(ManageIQ::Providers::Azure::CloudManager#with_provider_connection) Connecting through ManageIQ::Providers::Azure::CloudManager: [Azure Stage 1 - South Central US]

About every 30 seconds. Does it typically take a long time to import an existing subscription into ManageIQ? Sorry if these questions are simplistic, this is the first MIQ appliance I’ve setup and I haven’t found much in googling or in the user docs.


#6

It did eventually error out, I’ve attached the stack trace as it’s rather large. It complains of a timeout error, and it looks like it’s trying to collect the orchestration stacks when that occurs? I’m experiencing another error refreshing our North Central subscription as well, it’s also dying on retrieving stack templates, though it fails immediately. I’ve attached that stack trace as well
North Central Stack Trace.txt (6.9 KB)
Timeout Stack Trace.txt (9.0 KB)


#7

The first issue looks like an actual bug of some sort. I’ve submitted an issue on our tracker at https://github.com/ManageIQ/manageiq-providers-azure/issues/83.

The second issue is a known issue that we addressed by upgrading the azure-armrest version. Newer versions of the azure-armrest gem will now skip over storage accounts when it hits a timeout error.

While this solution is not perfect, there is little else we can do because the method required to get private images from unmanaged storage is a bit ugly, owing partly to the fact that Azure provides no API for it. Note that this issue goes away for managed storage/images.

Anyway, if you can upgrade your azure-armrest gem, at least the timeout issues should go away.


#8

I’ve upgraded the azure-armrest gem to version 0.7.4, but the issue seems to persist. The old gem is still installed, but bundle list displays 0.7.4, so I think the app is using that.

Either way, is there a way to disable looking through the unmanaged storage outright? We only use unmanaged images as a bridge to get a managed image, and we have quite a few storage accounts. Also, thanks for opening that bug report.


#9

Please apply these patches:


If you are still having issues after that, please let me know the details and I’ll help you from there.


#10

After applying the patch the bug in the case of no templates is fixed, but I’m still experiencing intermittent timeouts on one subscription, and consistent timeouts on a second. They both fail with 503: Service Unavailable, and the refresh errors out when that occurs.


#11

A 503 would indicate a problem on the Azure side, and wasn’t something we explicitly checked against. Please do me a favor and submit an issue for this, and I’ll see what I can do.