Problem with connection to VMWare vCenter after upgrading to Kasparov

Hi all,
anyone experienced with connection problem to VMWare provider after upgrading from Ivanchuk to Kasparov?

We are getting the following error stating that the refresh failed:
[----] E, [2021-06-28T12:30:49.230509 #161432:2b22192b23b4] ERROR – : MIQ(ManageIQ::Providers::Vmware::InfraManager::Inventory::Collector#vim_collector) EMS: [VMware HCI vamg200], id: [9] Refresh failed
[----] E, [2021-06-28T12:30:49.230760 #161432:2b22192b23b4] ERROR – : [ActiveRecord::ConnectionTimeoutError]: could not obtain a connection from the pool within 5.000 seconds (waited 5.000 seconds); all pooled connections were in use Method:[block (2 levels) in class:LogProxy]
[----] E, [2021-06-28T12:30:49.231070 #161432:2b22192b23b4] ERROR – : /opt/manageiq/manageiq-gemset/gems/activerecord- block in wait_poll' /opt/manageiq/manageiq-gemset/gems/activerecord- loop’
/opt/manageiq/manageiq-gemset/gems/activerecord- wait_poll' /opt/manageiq/manageiq-gemset/gems/activerecord- internal_poll’
/opt/manageiq/manageiq-gemset/gems/activerecord- internal_poll' /opt/manageiq/manageiq-gemset/gems/activerecord- block in poll’
/usr/share/ruby/monitor.rb:230:in mon_synchronize' /opt/manageiq/manageiq-gemset/gems/activerecord- synchronize’
/opt/manageiq/manageiq-gemset/gems/activerecord- poll' /opt/manageiq/manageiq-gemset/gems/activerecord- acquire_connection’
/opt/manageiq/manageiq-gemset/gems/activerecord- checkout' /opt/manageiq/manageiq-gemset/gems/activerecord- connection’
/opt/manageiq/manageiq-gemset/gems/activerecord- retrieve_connection' /opt/manageiq/manageiq-gemset/gems/activerecord- retrieve_connection’
/opt/manageiq/manageiq-gemset/gems/activerecord- connection' /opt/manageiq/manageiq-gemset/gems/activerecord- connection’
/opt/manageiq/manageiq-gemset/gems/activerecord- alias_tracker' /opt/manageiq/manageiq-gemset/gems/activerecord- scope’
/opt/manageiq/manageiq-gemset/gems/activerecord- scope' /opt/manageiq/manageiq-gemset/gems/activerecord- association_scope’
/opt/manageiq/manageiq-gemset/gems/activerecord- scope' /opt/manageiq/manageiq-gemset/gems/activerecord- scope’
/opt/manageiq/manageiq-gemset/gems/activerecord- find_target' /opt/manageiq/manageiq-gemset/gems/activerecord- load_target’
/opt/manageiq/manageiq-gemset/gems/activerecord- load_target' /opt/manageiq/manageiq-gemset/gems/activerecord- records’
/opt/manageiq/manageiq-gemset/gems/activerecord- each' /opt/manageiq/manageiq-gemset/gems/activerecord- select’
/opt/manageiq/manageiq-gemset/gems/activerecord- select' /var/www/miq/vmdb/app/models/mixins/authentication_mixin.rb:85:in authentication_userid_passwords’
/var/www/miq/vmdb/app/models/mixins/authentication_mixin.rb:491:in available_authentications' /var/www/miq/vmdb/app/models/mixins/authentication_mixin.rb:257:in authentication_type’
/var/www/miq/vmdb/app/models/mixins/authentication_mixin.rb:479:in authentication_best_fit' /var/www/miq/vmdb/app/models/mixins/authentication_mixin.rb:157:in auth_user_pwd’
/opt/manageiq/manageiq-gemset/bundler/gems/manageiq-providers-vmware-eb218a96aad1/app/models/manageiq/providers/vmware/infra_manager/inventory/collector.rb:137:in vim_connect' /opt/manageiq/manageiq-gemset/bundler/gems/manageiq-providers-vmware-eb218a96aad1/app/models/manageiq/providers/vmware/infra_manager/inventory/collector.rb:61:in vim_collector’
/opt/manageiq/manageiq-gemset/bundler/gems/manageiq-providers-vmware-eb218a96aad1/app/models/manageiq/providers/vmware/infra_manager/inventory/collector.rb:55:in `block in vim_collector_thread’

Also, when provisioning a VM in VMware, it is created in vCenter and the template is copied but after that ManageIQ cannot get info about the created VM (though I can see it in vCenter GUI)

[----] I, [2021-06-28T12:30:52.248820 #161419:2af1bb529964] INFO – : Q-task_id([r2861_miq_provision_5162]) MIQ(ManageIQ::Providers::Vmware::InfraManager::Provision#poll_destination_in_vmdb) Unable to find Vm [vnnub0355-TB] with ems_ref [vm-688315], will retry

I checked from ManageIQ Providers section and validated VMWare credentials.
With the same settings, everything works fine in Ivanchuk.

Any ideas would be appreciated.

Thank you!

Looking at the code, I don’t see any changes to our configuration of the pool size, found in the defaults in, or in the puma based workers, as they’re the same values from ivanchuk to kasparov.

In both ivanchuk and kasaparov, the default pool size in production is 5 and for the UI/web service(api) workers is 8, and remote console workers is 14. It’s possible you have local changes that are smaller than these values.

One thing you can do is enable debug logging in your test environment, increase the default database.yml from 5 to 10 . Please ensure you make backups or take notes on all changes you make so you can “undo” them at the end:

In advanced settings, change the value of level in the log section from info to debug, save, then restart.

  :level: debug

Since this is in the provider collector worker, I don’t think you need to modify the puma based workers. Then, you’ll start seeing messages like this in the /var/www/miq/vmdb/log/evm.log:

PostgreSQLAdapter#log_after_checkout, connection_pool: size: 5, connections: 1, in use: 1, waiting_in_queue: 0

You’ll want to run several refreshes of the provider, then grep the evm.log for these messages from the inventory collector/refresh worker. In your example, pid: #161432

Take note of how large the pool is (5 in the example above, it should be be 10 if you changed it from 5 to 10), and most importantly how many the “in use” becomes. The “in use” value should rise and fall. Once you know that maximum number, you can track down what number it rises to and if that’s a concern. It should not exceed 5 as we haven’t changed this value in a long time but if you do, please do review what it’s doing at that time and open an issue.

Good luck.

This sounds like the bug we fixed here: Move save_inventory thread out of the inventory collector by agrare · Pull Request #700 · ManageIQ/manageiq-providers-vmware · GitHub

There was an issue where if the main collector thread died it would start a new save inventory thread even if the old one was still running. If you hit the same issue that caused the collector thread to die multiple times you eventually run out of pooled connections.