I’m seeing a new behavior within the last few weeks with a MIQ appliance that has been running for about 18 months without issues. Within the last few weeks, it seems like we’re seeing a nearly 100% failure of the automation engine to run an embedded ansible playbook.
When I look at what’s going on in the process list on the server, I can see something like the following for each Active request for provisioning, which make it look like the ssh-add step in the runner command is hanging. I don’t see anything in the evm.log or automation.log that I can identify as an error.
If I go look at the ssh_key_data pipe file, so not sure how to debug. Trying to connect another process to that file hangs. I assume that might be the cause and the pipe is connected to the ansible-runner?
root 4885 1 0 Feb08 ? 00:00:02 /usr/bin/python2 /usr/bin/ansible-runner start /tmp/ansible-runner20220208-30949-75p01z --json --ident result --playbook provision.yml --project-dir /tmp/ansible-runner-git20220208-14751-1iimnkl
root 4887 4885 0 Feb08 pts/0 00:00:00 sh -c ssh-add /tmp/ansible-runner20220208-30949-75p01z/artifacts/result/ssh_key_data && rm -f /tmp/ansible-runner20220208-30949-75p01z/artifacts/result/ssh_key_data && ansible-playbook --become-method sudo -i /tmp/ansible-runner20220208-30949-75p01z/inventory -e @/tmp/ansible-runner20220208-30949-75p01z/env/extravars provision.yml
root 4888 4887 0 Feb08 ? 00:00:00 /usr/bin/ssh-agent sh -c ssh-add /tmp/ansible-runner20220208-30949-75p01z/artifacts/result/ssh_key_data && rm -f /tmp/ansible-runner20220208-30949-75p01z/artifacts/result/ssh_key_data && ansible-playbook --become-method sudo -i /tmp/ansible-runner20220208-30949-75p01z/inventory -e @/tmp/ansible-runner20220208-30949-75p01z/env/extravars provision.yml
root 4889 4887 0 Feb08 pts/0 00:00:00 ssh-add /tmp/ansible-runner20220208-30949-75p01z/artifacts/result/ssh_key_data
Thanks for any tips on debugging this.