Ansible-runner getting stuck before running playbook command

I’m seeing a new behavior within the last few weeks with a MIQ appliance that has been running for about 18 months without issues. Within the last few weeks, it seems like we’re seeing a nearly 100% failure of the automation engine to run an embedded ansible playbook.

When I look at what’s going on in the process list on the server, I can see something like the following for each Active request for provisioning, which make it look like the ssh-add step in the runner command is hanging. I don’t see anything in the evm.log or automation.log that I can identify as an error.

If I go look at the ssh_key_data pipe file, so not sure how to debug. Trying to connect another process to that file hangs. I assume that might be the cause and the pipe is connected to the ansible-runner?

root      4885     1  0 Feb08 ?        00:00:02 /usr/bin/python2 /usr/bin/ansible-runner start /tmp/ansible-runner20220208-30949-75p01z --json --ident result --playbook provision.yml --project-dir /tmp/ansible-runner-git20220208-14751-1iimnkl
root      4887  4885  0 Feb08 pts/0    00:00:00 sh -c ssh-add /tmp/ansible-runner20220208-30949-75p01z/artifacts/result/ssh_key_data && rm -f /tmp/ansible-runner20220208-30949-75p01z/artifacts/result/ssh_key_data && ansible-playbook --become-method sudo -i /tmp/ansible-runner20220208-30949-75p01z/inventory -e @/tmp/ansible-runner20220208-30949-75p01z/env/extravars provision.yml
root      4888  4887  0 Feb08 ?        00:00:00 /usr/bin/ssh-agent sh -c ssh-add /tmp/ansible-runner20220208-30949-75p01z/artifacts/result/ssh_key_data && rm -f /tmp/ansible-runner20220208-30949-75p01z/artifacts/result/ssh_key_data && ansible-playbook --become-method sudo -i /tmp/ansible-runner20220208-30949-75p01z/inventory -e @/tmp/ansible-runner20220208-30949-75p01z/env/extravars provision.yml
root      4889  4887  0 Feb08 pts/0    00:00:00 ssh-add /tmp/ansible-runner20220208-30949-75p01z/artifacts/result/ssh_key_data

Thanks for any tips on debugging this.

Is this is the same as Execution of ansible-runner hangs with no output either to stdout or stderr. · Issue #993 · ansible/ansible-runner · GitHub ? We saw that issue recently when one of our contributors had CrowdStrike Falcon installed and it interfered with Ansible Runner.

We deployed CrowdStrike falcon about 3 months ago, interesting correlation there and that description exactly matches to what I am observing.

Apologies for not searching for issues over on the ansible-runner github before posting here.

ugh -going down that rabbit hole. Aside from a quick test, my IT is NOT going to let us disable/uninstall falcon-sensor on the MIQ appliance. Maybe I can convince them to put in an exclusion rule for /tmp/ansible-runner*, but I did a quick one off test and confirm that stopping falcon-sensor and then triggering embedded ansible, the job works as expected.

The current hack I’m working on building a new method to make an external call to Jenkins to run the playbook and wait for completion to report status to the automation engine.

Apologies for not searching for issues over on the ansible-runner github before posting here.

No apologies needed…it took us a while of searching to finally find that too.

With respect to a fix, I’m afraid the only fix at the moment is to exclude. Ultimately, though, this is a falcon bug, in my opinion, in that they are blocking legitimate named pipe usage. You might be able to open a bug ticket with them through your employer.