We discovered an Ansible Service Git connection issue that occurs when the Embedded Ansible Service provision state machine encounters a problem checking out the Git repo. The Service provision fails without waiting for, or attempting to retry the Git connection.
Why it’s happening now:
We changed our Embedded Ansible implementation in Ivanchuck/5.11 to use ansible-runner instead of Ansible Tower which was used previously.
Even though Ansible Service provisioning hasn’t been modified, the Embedded Ansible implementation changes affect the Service provisioning behavior.
What can I do about it?
Unfortunately, there’s no hot fix for the current release. Modifying the existing provisioning behavior would require a significant code change.
The good news is that we created a workaround for the issue.
We added code in the 22.214.171.124 release that supports Automate changes that are a key part of the workaround.
The back end changes can be found at: http://github.com/ManageIQ/manageiq/pull/20759
The code that is responsible for doing the git checkout and calling ansible-runner to run a playbook is called during the
execute state of the Generic Service state machine. We can insert an Automate method in a state prior to the
execute state where we can check the git connection and retry the state machine, if necessary, until either the repo becomes available or the maximum number of retries have been exceeded. Once that method ends successfully (meaning it was able to connect to git) we can proceed to the
execute state with a high degree of confidence in the Service provision completing successfully.
How it works:
The major part of the workaround is a new
check_connection Automate method which checks that the git repo is accessible before it allows the state machine to progress to the
check_connection method affects the state machine as follows:
If the git connection is:
- available the first time the method runs,
the state machine proceeds to the
- initially unavailable, then becomes available some time before the max_retries attempts,
the state machine proceeds to the
- still not available after the max_retries attempts,
the state machine aborts with a message that it has exceeded the (configurable) max_retries count specified for the
Automate Note - The max_retries setting in the Generic Service state machine
Pre5 state determines how many times to retry the git connection code. The default is 100.
The git_retry domain we created contains only 2 changes to the Automate model.
- A new
- A modified
provisioninstance. The instance was modified to add a call to the
check_connectionmethod in the
The screenshot below shows the ManageIQ system domain
provision instance has no value in the
The screenshot below shows the
pre5 state has the value of
METHOD::check_connection. Notice the
GenericLifecycle class contains the
check_connection Automate method. The
check_connection method is new and does not exist in the ManageIQ domain.
Note — specifying an Automate method using the
METHOD:: prefix (called method notation) allows us to use a
state relationship to directly call an Automate method without having to create/use an Instance to call the method.
How do I apply the workaround to my environment?
The workaround requires a minimum version of 5.11.10.
- Import and enable
- Create Ansible Playbook Service
- Order Service.