I have been successfully implementing provisioning and retirement workflows in ManageIQ for a while, and each time, during the design/debug phase, I have to deal with non working state machines that leave a lot of crap in the services ManageIQ is integrated with.
For example, I create fixed address records on my DHCP server, so that the virtual machine is SSH-able once started. During the debug stages it happens quite regularly that I misconfigure some item and my state machine is aborted. And I have to delete manually the DHCP record to keep my environment clean.
So, I would like to implement a rollback policy that triggers the cleanup steps. Even before ManageIQ was released, this subject had been covered by @ramrexx and he handled it with a call to the a method with the state and vm as arguments, looking like this:
def cleanup(state, vm)
$evm.log("info", "Cleaning up from state '#{state}':")
case state
when 'AcquireIPAddress'
$evm.log("info", "Calling ReleaseIPAddress")
$evm.instantiate("/Infrastructure/VM/Provisioning/StateMachinesMethods/ReleaseIPAddress")
when 'RegisterDHCP'
$evm.log("info", "Calling ReleaseIPAddress")
$evm.instantiate("/Infrastructure/VM/Provisioning/StateMachinesMethods/ReleaseIPAddress")
$evm.log("info", "Calling UnregisterDHCP")
$evm.instantiate("/Infrastructure/VM/Provisioning/StateMachinesMethods/UnregisterDHCP")
when 'RegisterDNS'
$evm.log("info", "Calling ReleaseIPAddress")
$evm.instantiate("/Infrastructure/VM/Provisioning/StateMachinesMethods/ReleaseIPAddress")
$evm.log("info", "Calling UnregisterDHCP")
$evm.instantiate("/Infrastructure/VM/Provisioning/StateMachinesMethods/UnregisterDHCP")
$evm.log("info", "Calling UnregisterDNS")
$evm.instantiate("/Infrastructure/VM/Provisioning/StateMachinesMethods/UnregisterDNS")
when 'Provision'
log(:info, "Calling vm.retire_now for cleanup")
vm.retire_now
else
$evm.log("info", "Nothing to be done.")
return
end
end
You can then either call it in the ‘rescue’ statement of you methods, or use the ‘on_error’ method to point on a rollback method. However, I see two caveats to this approach:
- Maintenance can become quite difficult over time, because of the workflow evolution.
- It covers only VMProvision_VM state machine. I need to be able to rollback all the items of a service bundle, in case one of them fails to be provisioned.
So, here are a few thougts on this, that I would like to develop with your help and wisdom
Whenever a step in the provisioning workflow fails, ManageIQ should be able to launch a rollback workflow.
How to trigger the rollback ?
- Instantiate a state machine:
- From the state in the ‘rescue’ statement,
- From the state machine engine, in the method called by ‘on_error’.
- Forward state information to the state machine (keep track of the required information for rollback):
- $evm.root:
- The $evm.root is the same because we are in the same workspace.
- In every state that performs an action, we could add some information in $evm.root or in the
- $evm.root[‘miq_provision’]: this object might not exist in all state machines.
The code could look like:
begin
@state = 'RegisterDHCP'
@stateData = { :rollback_state => 'UnregisterDHCP', :vmname => 'cfme001', :ip_address => '1.1.1.1', :mac_address => '00:00:00:00:00:01', :domain => 'example.com' }
$evm.root['workflow'][@state] = @stateData
exit MIQ_OK
rescue => err
$evm.log("error", "#{@method} - [#{err}]\n#{err.backtrace.join("\n")}")
$evm.instantiate("/Infrastructure/VM/Provisioning/StateMachines/VMProvision_VM/template_rollback")
exit MIQ_ABORT
end
Then we create a state machine /Infrastructure/VM/Provisioning/StateMachines/VMProvision_VM/template_rollback
, which is just a “mirror” of the provisioning state machine. Each state can access the $evm.root['workflow']
information to rollback its counterpart state.
When rollbacking a service provisioning state machine, we should rollback all the items that are part of the service. This requires an inspection of the service to trigger the right state machine on each item.