Method in a state machine wont repeat

Hi all,
I am having a problem with a state method in a state machine, that is set for repeat until it succeeds.


The name of the method is wait_for_completion_OMU and is set for 100 retries.

When the method runs and finished, its status is ae_result: retry
But anyway, according to MIQ log, it ends with MIQ_OK and wont retry.

Any idea why the ae_result: retry is not picked up and the method finishes?

Thank you for any hint!

With a state machine retry, the method should complete, but the whole state machine will be re-instantiated starting from the retrying method, after the retry interval time period has elapsed.

Here’s an example of a check_completed method:

$evm.log(:info, "In /Stuff/StateMachines/Methods/check_completed")
update_cmdb_request = $evm.vmdb(:miq_request, $evm.get_state_var(:update_cmdb_request_id))
$evm.log(:info, "update_cmdb_request ID = #{update_cmdb_request.id}")
$evm.log(:info, "request state = #{update_cmdb_request.state}")
case update_cmdb_request.state
when "pending", "active"
  $evm.log(:info, "Request still active, waiting for 15 seconds...")
  $evm.root['ae_retry_interval'] = '15.seconds'
  $evm.root['ae_result']         = 'retry'
when "finished"
  $evm.log(:info, "Request complete!")
  $evm.root['ae_result'] = 'ok'
else
  $evm.log(:info, "Not sure what happened")
  $evm.root['ae_result'] = 'error'
end
exit MIQ_OK

What did you set as your $evm.root[‘ae_retry_interval’]?

pemcg

Thank you, here is the entire method, that should be repeated:

#
# Description: Given a Ansible Job Id, check it's status
#

module ManageIQ
  module Automate
    module AutomationManagement
      module AnsibleTower
        module Operations
          module StateMachines
            module Job
              class WaitForCompletion
                JOB_CLASS = 'ManageIQ_Providers_AnsibleTower_AutomationManager_Job'.freeze
                def initialize(handle = $evm)
                  @handle = handle
                end

                def main
                  check_status(ansible_job)
                  @handle.log(:info, "ae_result: #{@handle.root['ae_result']}, ae_retry_interval: #{@handle.root['ae_retry_interval']}")
                  puts "XXXOMUXXX"
                end

                private

                def check_status(job)
                  begin
                    status, reason = job.normalized_live_status
                    case status
                    when 'transient'
                      @handle.root['ae_result'] = 'retry'
                      @handle.root['ae_retry_interval'] = '5.seconds'
                    when 'failed', 'create_canceled'
                      @handle.root['ae_result'] = 'error'
                      @handle.log(:error, "Job failed for #{job.id} Ansible ID: #{job.ems_ref} reason #{reason}")
                      #variables for status_email method
                      $evm.set_state_var("awx_adc_check_status_error", "true")
                      $evm.set_state_var("awx_adc_check_status_job_id", job.id)
                      $evm.set_state_var("awx_adc_check_status_ems_ref", job.ems_ref)
                      $evm.set_state_var("awx_adc_check_status_error_reason", reason)
                      job.refresh_ems
                      #exit(MIQ_ABORT)
                    when 'create_complete'
                      @handle.root['ae_result'] = 'ok'
                      job.refresh_ems
                    else
                      @handle.root['ae_result'] = 'error'
                      @handle.log(:error, "Job failed for #{job.id} Ansible ID: #{job.ems_ref} Unknown status #{status} reason #{reason}")
                      job.refresh_ems
                    end
                  rescue => err
                    $evm.log("error", "#{$method} (wait_for_completion) - [#{err}]\n#{err.backtrace.join("\n")}")
                    @handle.root['ae_result'] = 'retry'
                    @handle.root['ae_retry_interval'] = 2.minute
                    begin
                      job.refresh_ems
                    rescue => er
                      $evm.log("error", "#{$method} (wait_for_completion) - [#{err}]\n#{err.backtrace.join("\n")}")
                    end
                  end                    
                end

                def ansible_job
                  job_id = @handle.get_state_var(:ansible_job_id)
                  if job_id.nil?
                    @handle.log(:error, 'Ansible job id not found')
                    exit(MIQ_ERROR)
                  end
                  fetch_job(job_id)
                end

                def fetch_job(job_id)
                  job = @handle.vmdb(JOB_CLASS).find(job_id)
                  if job.nil?
                    @handle.log(:error, 'Ansible job with id : #{job_id} not found')
                    exit(MIQ_ERROR)
                  end
                  job
                end
              end
            end
          end
        end
      end
    end
  end
end

ManageIQ::Automate::AutomationManagement::AnsibleTower::Operations::StateMachines::Job::WaitForCompletion.new.main

and you never see the state machine re-run after 5 seconds (on any appliance)?

Exactly, unfortunately not. I dont know what I am doing wrong.

How are you calling the state machine initially?

From another state machine - like this:


(call_ansible_job)

This state machine is executed from a button added to a virtual machine service.

Try increasing the max retries in the outer state machine from 1. If an inner state machine in a set of nested state machines retries, the outer(s) will also be re-instantiated starting from the state(s) that call the inner state machine(s).

1 Like

Thank you very much for your help. It worked!

:thumbsup: