Jump between StateMachines


#1

Hi everybody,
We need to implement some sort of rollback in our StateMachines, but I cannot figure out how we would implement that, without going insane.,

Use-Case:
Let’s say I have two StateMachines ProvisionMyService and RetireMyService

ProvisionMyService
    prov1
    prov2
    prov3
RetireMyService
    retire3
    retire2
    retire1

The steps are written that retire3 undos whatever prov3 did, retire2 undos prov2 and so on.
Now the Code in prov2 recognizes, that the request is invalid and the request cannot be processed any further. The problem is that prov1 has already run and the StateMachine needs to rollback those changes and also needs to skip prov3

Question
Has anybody implemented something like this? Any ideas how one would go about it?
We currently have one StateMachine that has specific ErrorHandling States we can jump to using $evm.root['ae_next_state'], but it is only one StateMachine and this apporach is not scaleable

Problem
I know that setting $evm.root['ae_next_state'] = 'state2' allows me to jump within a StateMachine.
As far as I know it cannot be used to jump to another StateMachine. At least by reading the code it seems to not be possible (https://github.com/ManageIQ/manageiq-automation_engine/blob/master/lib/miq_automation_engine/engine/miq_ae_engine/miq_ae_state_machine.rb)

I am not aware of anything built-in in ManageIQ that would allow to jump between state machines

Ideas I have got so far

  • Use some very generic embedded methods in on_entry, on_exit and on_error steps, to kick of the Retirement/Clenaup StateMachine, if the StateMachine was on _error at some point.
  • Use some clever $evm.instantiate()-call to do the cleanup in one go and just exit with MIQ_ABORT. However I think instantiate is not able to handle retries…

#2

If your ProvisionMyService was launched (even indirectly) from a request, you could perhaps write the current completed state into the request’s options hash from each provisioning state’s on_exit method, something like request.set_option(:state_complete, 'prov2'). Then if you were happy to perform your rollback asynchronously, you could fire off a create_automation_request to run RetireMyService from an on_error method, passing as an argument the request ID of the provision request.

state_1 on_entry in RetireMyService could retrieve the request object and lookup the most recent options[:state_complete], and call ae_next_state to jump to the appropriate undo stage.

This also has the advantage that the retire state machine has access to the provisioning option hash, and so should have all of the information necessary to undo everything.

Just off the top of my head, totally unproven, not guaranteed to work :slight_smile:

pemcg


#3

Yeah. I had the same idea regarding the automation request, just as I hit the send button :slight_smile:.

I don’t see a reason why it shouldn’t work.
I am probably going to replace the request with a state variable, because buttons don’t have requests associated.

The question is, how complicated is it going to be to implement that in a generic way, so people don’t have to do more than abort_workflow_now('/RetireMyService') or something similar


#4

I just thought of a problem with the approach. If people (scripts) wait for the request to finish and poll the API, the request will be in Finished/Error, while the automation request does the cleanup. Therefore the next request might fail, because not all resources are released yet.

Either keep the original request alive, while the automation task runs or set the request to Finished/Cleanup and update it from the automation task.


#5

I agree, the provisioning state machine could poll and wait for completion of the decommissioning state machine before it completed.

Regarding the button, it might be a simpler workflow if the method launched from the button just issued a create_automation_request for the provisioning workflow. In that way you’d always get a request object for both provisioning and decommission/tidyup. In fact that might be a way of helping making it more generic.

pemcg


#6

Actually… Great Idea. I haven’t though about that yet :slight_smile: