How does Automate recover if the active appliance dies?

I was wondering how Automate is supposed to recover in a scenario like this:

There are at least two Appliances with Automate Role enabled in the same zone. An Automate action (like provisioning a VM) is triggered and executed.

If the appliance which is currently executing the action dies for some reason, how does MIQ recover?

Does the Automate state machine continue on the last step on the second appliance? Or does it cancel the full request as failed?

Somewhat related: If there are more than two appliance with Automate enabled, does MIQ decide for each individual step which worker will execute the next state, or does it stick to the same appliance?

Any pointers are highly appreciated,

@mkanoor or @gmccullough any pointers here?

Currently there is no way to restart a task from a failed appliance. When a task is running the failure could happen at any point in the internal workflow or in the automate engine. We don’t have checkpoints for the tasks or for the automate engine that would allow us to restart the task.


So I suppose that also answers the second question: An given Automate task “sticks” to one appliance?

If an automate task is running a state machine which has retries, then technically it is possible that the second or any subsequent retry request is going to get queued and will get picked up by the next free worker in the zone which could be same or different. Each state machine retry is a restart of the request except previously processed states would be skipped but any non state machine instances would be resolved again to build up the workspace.

Thank you very much.