Infra provider won't populate

All, Hoping I can get some guidance on this issue I am having. I am a newbie so please forgive me if this has been addressed before.

RHV Production - 4.3.10.4-0.1.el7
RHV Dev - 4.3.8.2-0.4.el7

I have two rhv stacks. one for dev with only two hosts. another for production with 10 hosts. when I add the dev environment as a provider under compute/infra/providers. it works fine, it populates all hosts and vm. Not enabling C&U yet. Just trying to pull in the hosts an vm’s. however when I do the same thing with the prod rhv environment it does auth no problem and adds it to the list of providers but never populates anything, no hosts no vm’s, nothing. the only thing I can find in the Config/diag/mgiq logs is something about “failed to process queues” reason " http response code is 500" not sure if that is related but that’s all I find ( pasted below). I have tested with both TLS on and off. add cert of rhv-m when TLS is set to yes. and auth works. Just no populating. I have also testing by adding my vcenter stack and this works fine and populates all host’s and vm’s as well.

Any help on this would be much appreciated. I am unsure where to find detailed logs for this so even that would be helpful. Thanks all.

########
MIQ(ManageIQ::Providers::Redhat::InfraManager::FuturesCollector#process_queues) failed to process queues, due to: Fault reason is “Operation Failed”. HTTP response code is 500.
[----] I, [2020-11-10T12:28:09.817014 #99001:2ad464e8994c] INFO – : Exception in realtime_block :ems_refresh - Timings: {:collect_inventory_for_targets=>
########
2020-11-10T12:28:09.817125 #99001:2ad464e8994c] ERROR – : MIQ(ManageIQ::Providers::Redhat::InfraManager::Refresher#refresh) EMS: [LAS-RHV], id: [18] Refresh failed
[----] E, [2020-11-10T12:28:09.817186 #99001:2ad464e8994c] ERROR – : [RuntimeError]: Failed to fetch attributes for vm Method:[block (2 levels) in class:LogProxy]
[----] E, [2020-11-10T12:28:09.817219 #99001:2ad464e8994c] ERROR – : /opt/manageiq/manageiq-gemset/bundler/gems/manageiq-providers-ovirt-d45edfdca71e/app/models/manageiq/providers/redhat/infra_manager/inventory.rb:51:in `collect_inv_with_attributes_async’

T

I have tried everything i can think of. What i have found is if I add a new vm it will show up in manageiq. so it seams its not getting any history. I have checked the history db and created a user and added it to the provider. but still nothing. re added the provider several times. and it it only sees new vm’s still no hosts. nothing more.

Is there anyone that can offer some sort of advice on this? Anyone??

Thanks

I believe the issue I am having here is related to my Rhv-manager… I have added the same rhv-m to three different ManageIQ builds and I get the same result. I can add the provider but it does not load anything. 0 hosts 0vm’s 0 everything. I would love some advice here. I know this is not a forum for rev. but maybe the logs in Miq could point me in a direction. Anyone?

Thanks all.

Still hoping someone can point me in the right direction here. I have been digging for over a month now and I can’t seem to isolate the issue. this is only happening with our prod RHV-M. Please anyone.

Thanks
T

Are you using a fully admin account as the RHV provider credentials (such as admin@internal)?

You could take a look at the timings of the breakdown of the inventory collection to see if that gives you any clues. The raw lines are in evm.log, but you could also use the ems_refresh_timings.rb script from here: https://github.com/RHsyseng/cfme-log-parsing if it helps.

Try to force a full refresh using the Configuration -> Refresh Relationships and Power States button drop-down before you look at the timings.

Also might be worth checking if any of your workers are being killed, try the following on each of your appliances:

zgrep 'MiqServer#exceeded_memory_threshold' /var/www/miq/vmdb/log/evm.log* | grep "WARN\|ERROR"

Hope this helps,
pemcg

Pemcg,
First thank you for your help on this. It is greatly appreciated.

When you say look at the timings of the breakdown, what sort of pattern am I looking for here? I am not sure what to expect to see here. I see the attempt to refresh and at some point I get the HTTP response code 500. Below is a small snippet of the evm.log after doing a refresh. Can we tell if that 500 error is a result of reaching out to the RHV-manager or something in the Manageiq appliance? If it is a result of reaching out to the rhv-m. then where could I find those logs on the manager?

Next. I checked using your zgrep line. and I only have one instance back on the 8 of dec showing worker exceeded. I would think I wold see many of these if this was a continuing issue, correct,
And yes. the provider is authed using admin@interal

Thanks again for your help on this.
T

I, [2020-12-16T14:58:20.969079 #2694:2b16f21bd968] INFO – : MIQ(ManageIQ::Providers::Redhat::InfraManager::Refresher#collect_inventory_for_targets) Filtering inventory for ManageIQ::Providers::Redhat::InfraManager [RHEV-M (192.168.13.149)] id: [2000000000005]…
[----] I, [2020-12-16T14:58:20.969186 #2694:2b16f21bd968] INFO – : MIQ(ManageIQ::Providers::Redhat::InfraManager#with_provider_connection) Connecting through ManageIQ::Providers::Redhat::InfraManager: [RHEV-M (192.168.13.149)]
[----] I, [2020-12-16T14:58:21.203774 #2694:2b16f21bd968] INFO – : MIQ(ManageIQ::Providers::Redhat::InfraManager::Refresher#collect_inventory_for_targets) Filtering inventory…Complete
[----] I, [2020-12-16T14:58:21.203969 #2694:2b16f21bd968] INFO – : MIQ(ManageIQ::Providers::Redhat::InfraManager#with_provider_connection) Connecting through ManageIQ::Providers::Redhat::InfraManager: [RHEV-M (192.168.13.149)]
[----] I, [2020-12-16T14:58:21.302372 #2694:2b16f21bd968] INFO – : MIQ(ManageIQ::Providers::Redhat::InfraManager::Refresher#refresh_targets_for_ems) EMS: [RHEV-M (192.168.13.149)], id: [2000000000005] Refreshing target ManageIQ::Providers::Redhat::InfraManager [RHEV-M (192.168.13.149)] id [2000000000005]…
[----] D, [2020-12-16T14:58:21.302530 #2694:2b16f21bd968] DEBUG – : MIQ(ManageIQ::Providers::Redhat::InfraManager::Refresher#parse_targeted_inventory) EMS: [RHEV-M (192.168.13.149)], id: [2000000000005] Parsing inventory…
[----] I, [2020-12-16T14:58:21.302734 #2694:2b16f21bd968] INFO – : MIQ(ManageIQ::Providers::Redhat::InfraManager#with_provider_connection) Connecting through ManageIQ::Providers::Redhat::InfraManager: [RHEV-M (192.168.13.149)]
[----] I, [2020-12-16T14:58:21.551877 #2650:2ad31f8fc31c] INFO – : MIQ(ManageIQ::Providers::Redhat::InfraManager#with_provider_connection) Connecting through ManageIQ::Providers::Redhat::InfraManager: [RHEV-M (192.168.13.152)]
[----] E, [2020-12-16T14:58:22.938817 #2694:2b16f21bd968] ERROR – : MIQ(ManageIQ::Providers::Redhat::InfraManager::FuturesCollector#process_queues) failed to process queues, due to: Fault reason is “Operation Failed”. HTTP response code is 500.
[----] I, [2020-12-16T14:58:22.938989 #2694:2b16f21bd968] INFO – : Exception in realtime_block :ems_refresh - Timings: {:collect_inventory_for_targets=>0.6915237903594971, :parse_inventory=>1.6363098621368408, :parse_targeted_inventory=>1.636439561843872, :ems_refresh=>2.328165054321289}
[----] E, [2020-12-16T14:58:22.939246 #2694:2b16f21bd968] ERROR – : MIQ(ManageIQ::Providers::Redhat::InfraManager::Refresher#refresh) EMS: [RHEV-M (192.168.13.149)], id: [2000000000005] Refresh failed
[----] E, [2020-12-16T14:58:22.939396 #2694:2b16f21bd968] ERROR – : [RuntimeError]: Failed to fetch attributes for vm Method:[block (2 levels) in class:LogProxy]
[----] E, [2020-12-16T14:58:22.939489 #2694:2b16f21bd968] ERROR – : /opt/manageiq/manageiq-gemset/bundler/gems/manageiq-providers-ovirt-d45edfdca71e/app/models/manageiq/providers/redhat/infra_manager/inventory.rb:51:in `collect_inv_with_attributes_async

It looks to me like that 500 error is coming back from the RHV-M, are you able to check the logs on the RHV-M server?

pemcg

I was digging in /var/log/httpd for any issues. and I can see all the connections logs response code 200, but nothing i the error.log or ssl_error.log. where else would this log be?
Access.log -> 192.168.13.149 - - [17/Dec/2020:15:09:44 +0000] “GET /ovirt-engine/services/health HTTP/1.1” 200 31 “-” “Java/1.8.0_252”

Also strange is if I create a new vm in Rhv. that vm shows up in Miq. but any past vm’s won’t. not hosts either… only the new vm’s and the Storage domains that the vm lives on.

Thanks again Pemcg.

T

This is a small snippet of the /var/log/httpd/ssl_access.log after i ran a refresh on the infra provider.
Question is. several lines show 500 and some show 200… so I am thinking these may be the 500 errors i am seeing. They seem to always take place on the “diskattatchments” line. I am going to research that further but any idea what that is?

Thanks.

192.168.8.219 - - [16/Dec/2020:19:58:10 +0000] “GET /ovirt-engine/api/vms/f8fdd1c4-77b7-4374-8a85-dfb4f81e0221/diskattachments HTTP/1.1” 500 118

192.168.8.219 - - [16/Dec/2020:19:58:10 +0000] “GET /ovirt-engine/api/vms/ee83662d-ef7d-4ac3-bef2- 500 e3c2ad5a5/diskattachments HTTP/1.1” 200 360

192.168.8.219 - - [16/Dec/2020:19:58:10 +0000] “GET /ovirt-engine/api/vms/ee83662d-ef7d-4ac3-bef2- 500 e3c2ad5a5/snapshots HTTP/1.1” 200 326

192.168.8.219 - - [16/Dec/2020:19:58:10 +0000] “GET /ovirt-engine/api/vms/ee83662d-ef7d-4ac3-bef2- 500 e3c2ad5a5/nics HTTP/1.1” 200 589

192.168.8.219 - - [17/Dec/2020:13:08:52 +0000] “GET /ovirt-engine/api/vms/ee83662d-ef7d-4ac3-bef2- 500 e3c2ad5a5/diskattachments HTTP/1.1” 200 360

192.168.8.219 - - [17/Dec/2020:13:08:52 +0000] “GET /ovirt-engine/api/vms/ee83662d-ef7d-4ac3-bef2- 500 e3c2ad5a5/reporteddevices HTTP/1.1” 200 342

192.168.8.219 - - [17/Dec/2020:13:08:52 +0000] “GET /ovirt-engine/api/vms/ee83662d-ef7d-4ac3-bef2- 500 e3c2ad5a5/snapshots HTTP/1.1” 200 326

192.168.8.219 - - [17/Dec/2020:13:08:52 +0000] “GET /ovirt-engine/api/vms/f8fdd1c4-77b7-4374-8a85-dfb4f81e0221/diskattachments HTTP/1.1” 500 118

192.168.8.219 - - [17/Dec/2020:13:08:52 +0000] “GET /ovirt-engine/api/vms/ee83662d-ef7d-4ac3-bef2- 500 e3c2ad5a5/nics HTTP/1.1” 200 589

192.168.8.219 - - [17/Dec/2020:15:51:55 +0000] “GET /ovirt-engine/api/vms/ee83662d-ef7d-4ac3-bef2- 500 e3c2ad5a5/diskattachments HTTP/1.1” 200 360

192.168.8.219 - - [17/Dec/2020:15:51:55 +0000] “GET /ovirt-engine/api/vms/ee83662d-ef7d-4ac3-bef2- 500 e3c2ad5a5/snapshots HTTP/1.1” 200 326

192.168.8.219 - - [17/Dec/2020:15:51:55 +0000] “GET /ovirt-engine/api/vms/ee83662d-ef7d-4ac3-bef2- 500 e3c2ad5a5/reporteddevices HTTP/1.1” 200 342

192.168.8.219 - - [17/Dec/2020:15:51:55 +0000] “GET /ovirt-engine/api/vms/f8fdd1c4-77b7-4374-8a85-dfb4f81e0221/diskattachments HTTP/1.1” 500 118

You probably want to look in /var/log/ovirt-engine, probably engine.log and server.log. A 500 error will be a Java stack trace somewhere.

pemcg

It also seems to the same VM each time. Try making that same API call to the RHV-M using something like curl or Postman and see if you can reproduce the problem. If so then it might be time to log a RHV support ticket with Red Hat.

Also maybe see what the VM name is and have a look at the disks in the RHV-M console to see if anything looks not right with that VM.

pemcg

I didn’t notice it was the same vmid each time… good call. . I am a novice in using any Api manually but I will try to figure out how to do that. Also i would love to open a ticket with RH but if I mention the work manageIp or Cloudforms they yell at me that I don’t have support for it. So if I can isolate the error in rhv-m then I can point them to that… at least that’s the goal.

I did find these in the engine.log just search for the vm id.

2020-12-17 15:51:55,453Z ERROR [org.ovirt.engine.api.restapi.resource.validation.MappingExceptionMapper] (default task-554) Mapping exception while processing “GET” request for path “/vms/ f8fdd1c4-77b7-4374-8a85-dfb4f81e0221 /diskattachments”

I was able to isolate the vm id. it belongs to a vm i use for testing… standard windows vm. I will remove this vm and see if the issue continues.

fingers crossed.

T

I can really recommend https://www.postman.com, it makes making API calls a breeze.

pemcg

Ok. That vm was only for testing so I just removed it completely. and once I did a refresh in Miq provider everything populated. :smile: I have been trying to resolve this for over a month so I can’t thank you enough Pemcg. curious why one bad disk could cause Miq to just stop in its tracks. but its working now.

I will take a look at that site. Several times during my digging I wanted to test an api call so I could see the response more clearly but I don’t have any experience with that.

Thanks again for all your help on this. you are the hero for the day.

Thanks Pemcg.

T

That’s great news!

I think the issue was that for a full inventory collection, MIQ sends a whole batch of asynchronous concurrent API requests to the RHV-M to gather the inventory, and then polls on each of them to complete. If any of them returns a 500 error then the monitoring of the requests is aborted (I guess “internal server error” is pretty fatal), and the inventory collection fails. That’s why you were getting individual VMs fine, but not the whole inventory.

pemcg

Hmm. Now that makes sense. and would explain the strange behavior. Wish it was easier to isolate. but thank you for your help pinpointing this.

T