with this Topic, we want to understand how things are done.
Currently we have an Issue which is not clearly definable.
I’ll make a try:
Our Environment (darga-2) 8 Server (1 Master and 7 Worker) with one OpenStack Provider configured.
Once a Week we ran into an Issue where we make a provision call and it seems that nothing is working anymore…
Normally we’re auto approve every request, when this issue appears we get the message that we have to approve this request (with the approv or deny button in the request status page). This is the first sign of this issue.
If we approve this request now, there is only a message with “automation starting” but nothing happens for hours.
The workaround until now is to restart the platform, this helped really well in the past, but can not be the solution. An additional information is that, last time, we restarted only one worker node and the platform began to process the request… This is unlikely very hard to reproduce for us, and even harder to debug.
Is there anyone, a Senior, Architect or Developer who knows this issue or can try to explain what happens there?
In the past, we’ve asked many times for a little description how to debug requests before the Statemachine is running. Since this point, you can make some debug lines and so on, but before, if a requests stuck at some point (like i described above) before, is there any hint or description how to debug those issues?
Would be great if somebody have a bit time to discuss our problem.