Automation fails after 10 minutes


#1

Hello,

I have a heavy automation method that takes a lot of time. The problem is it fails every time in exactly 10 minutes with same messages

The following error occurred during method evaluation: SignalException: SIGTERM

So I understand that this is a normal behavior, but I would really appreciate any advice about extending that time.

Thanks in advance.


#2

@gmccullough can you assist @LubaLimon with their question or defer to a SME.


#3

I’ve managed to find it myself, sorry for bothering you


#4

Hi, @LubaLimon,

Instead of increasing the timeout you should create a state-machine that contains a state that uses the “retry” logic to loop and wait for the long running task to end.

There are several examples of state-machine retries in the ManageIQ domain, like the ones for provisioning, which include a CheckProvisioned state for retires. (See ManageIQ/Infrastructure/VM/Provisioning/StateMachines/VMProvision_VM/template)

You can read more about state-machines here:
https://pemcg.gitbooks.io/introduction-to-cloudforms-automation/content/chapter12/state_machines.html


#5

Thanks for advice, I’ve created a state-machine, but it still doesn’t work. Maybe you could take a look on what’s wrong.
Here’s my Instance that is runned on provision


#6

And this is the Instance that fails

Sorry, I can put only one image in a post


#7

A state-machine alone does not resolve the timeout issue. Each process has a 10 minute timeout, so any single automate resolution can run for up to 10 minutes. When you introduce a retry the process restarts and the timeout is reset as well.

The idea for long running task is to run them asynchronously so the the automate state-machine can perform a retry which would allow for the timeout to be reset.

Taking our provisioning state-machine as an example, we send the request to the provider to create the new VM then enter the CheckProvisioned state which retries waiting for the VM to be identified as a resource in the database. It could take a long time for the provider to create the VM depending on many factors, but the retry breaks up the process so we do not encounter the 10 minute timeout limit for a single process.

Similarly, retirement has a retry state waiting for the machine to power off.

I do not know the detail of your add_template method but you would need to determine how to launch the process and then have a state between the create_template and import_template states that would retry until the create was completed.


#8

Thanks a lot, that really helped.


#9

This is an example where you can issue a retry once to break a long running process

#
# Description: This method sets the retry once to force a break in the 
#              processing of long running tasks.
#
$evm.log(:info, "Checking if retry needs to be set")
if $evm.state_var_exist?('retry_once')
  ae_result = 'ok'
else
  $evm.set_state_var('retry_once', '1')
  ae_result = 'retry'
  $evm.log(:info, 'setting a retry once in the beginning')
end
$evm.root['ae_result'] = ae_result
$evm.root['retry_interval'] = 1.minute

You could add logic based on your requirements and insert this method between steps in a long running state machine. This example was used to break some pre processing step during Service Provisioning which would take about 7 to 8 minutes due to network latency and the subsequent state execution would fail because they would have 3 to 2 minutes to finish. By introducing a break you get the full 10 minutes for executing rest of the state machine. If each of your states take a long time to complete you might to use some attribute like a handle, request_id store that in a state_var and loop around that with a retry