We are still trying to sort out why most requests get stuck at “request_pending” for our system.
Below is a line-by-line comparison of the automation.log entries from two requests/processes, one that got “stuck” at “request_pending” and one that processed successfully.
Lines prefixed with “F” are from the process that failed / got stuck.
Lines prefixed with “S” are from the process that succeeded.
You’ll note that they are in step with one another until like 8 where the stuck process logs an extra entry. After that point, the two logs become very similar to one another again, minus some interesting differences.
I’ll put some commentary below these logs…
Line 1:
"S"[----] I, [2020-09-18T13:51:40.795014 #10072:4598e50] INFO -- : <AEMethod approve> Sending email to <Mike.Butak@expeditors.com> from <miq@expeditors.com> subject: <Request ID 403 - Your Service provision request has been approved by the level 2 approver.>
"F"[----] I, [2020-09-18T13:53:47.783933 #10063:838fd30] INFO -- : <AEMethod approve> Sending email to <Mike.Butak@expeditors.com> from <miq@expeditors.com> subject: <Request ID 403 - Your Service provision request has been approved by the level 3 approver.>
Line 2:
"S"[----] I, [2020-09-18T13:51:40.834755 #10072:69a5f0] INFO -- : <AEMethod [/EXP/custom/multiapprove/Email/approve]> Ending
"F"[----] I, [2020-09-18T13:53:48.223100 #10063:69a5f0] INFO -- : <AEMethod [/EXP/custom/multiapprove/Email/approve]> Ending
Line 3:
"S"[----] I, [2020-09-18T13:51:40.834883 #10072:69a5f0] INFO -- : Method exited with rc=MIQ_OK
"F"[----] I, [2020-09-18T13:53:48.223266 #10063:69a5f0] INFO -- : Method exited with rc=MIQ_OK
Line 4:
"S"[----] I, [2020-09-18T13:51:40.835341 #10072:69a5f0] INFO -- : Followed Relationship [miqaedb:/custom/multiapprove/Email/approve#create]
"F"[----] I, [2020-09-18T13:53:48.223701 #10063:69a5f0] INFO -- : Followed Relationship [miqaedb:/custom/multiapprove/Email/approve#create]
Line 5:
"S"[----] I, [2020-09-18T13:51:40.835539 #10072:69a5f0] INFO -- : Followed Relationship [miqaedb:/System/Policy/ServiceTemplateProvisionRequest_Approved#create]
"F"[----] I, [2020-09-18T13:53:48.223884 #10063:69a5f0] INFO -- : Followed Relationship [miqaedb:/System/Policy/ServiceTemplateProvisionRequest_Approved#create]
Line 6:
"S"[----] I, [2020-09-18T13:51:40.836421 #10072:69a5f0] INFO -- : Followed Relationship [miqaedb:/System/Policy/request_approved#create]
"F"[----] I, [2020-09-18T13:53:48.224340 #10063:69a5f0] INFO -- : Followed Relationship [miqaedb:/System/Policy/request_approved#create]
Line 7:
"S"[----] I, [2020-09-18T13:51:40.837442 #10072:69a5f0] INFO -- : Followed Relationship [miqaedb:/System/Event/RequestEvent/Request/request_approved#create]
"F"[----] I, [2020-09-18T13:53:48.224962 #10063:69a5f0] INFO -- : Followed Relationship [miqaedb:/System/Event/RequestEvent/Request/request_approved#create]
Line 8:
“S” NO COMPARABLE LOG ENTRY IN THIS FILE. The “Successful” request skips right to the next line…
"F"[----] I, [2020-09-18T13:53:52.840123 #10055:69a5f0] INFO -- : MiqAeEvent.build_evm_event >> event=<"request_pending"> inputs=<{"EventStream::event_stream"=>166633, :event_stream_id=>166633}>
Line 9:
"S"[----] I, [2020-09-18T13:51:44.974333 #10055:69a5f0] INFO -- : User [admin] with current group ID [2] name [EvmGroup-super_administrator]
"F"[----] I, [2020-09-18T13:54:39.015296 #10055:69a5f0] INFO -- : User [admin] with current group ID [1] name [Tenant My Company access]
Line 10:
"S"[----] I, [2020-09-18T13:51:44.990576 #10055:69a5f0] INFO -- : Instantiating [/System/Process/Event?EventStream%3A%3Aevent_stream=166630&MiqRequest%3A%3Amiq_request=403&MiqServer%3A%3Amiq_server=1&ServiceTemplateProvisionRequest%3A%3Aservice_template_provision_request=403&User%3A%3Auser=1&event_stream_id=166630&event_type=request_pending&object_name=Event&vmdb_object_type=service_template_provision_request]
"F"[----] I, [2020-09-18T13:54:39.017818 #10055:69a5f0] INFO -- : Instantiating [/System/Process/Event?EventStream%3A%3Aevent_stream=166634&MiqServer%3A%3Amiq_server=1&User%3A%3Auser=1&event_id=166634&event_stream_id=166634&event_type=job_create&object_name=Event&vmdb_object_type=event_stream]
Line 11:
"S"[----] I, [2020-09-18T13:51:45.020073 #10055:69a5f0] INFO -- : Updated namespace [/System/Process/Event?EventStream%3A%3Aevent_stream=166630&MiqRequest%3A%3Amiq_request=403&MiqServer%3A%3Amiq_server=1&ServiceTemplateProvisionRequest%3A%3Aservice_template_provision_request=403&User%3A%3Auser=1&event_stream_id=166630&event_type=request_pending&object_name=Event&vmdb_object_type=service_template_provision_request ManageIQ/System]
"F"[----] I, [2020-09-18T13:54:39.046677 #10055:69a5f0] INFO -- : Updated namespace [/System/Process/Event?EventStream%3A%3Aevent_stream=166634&MiqServer%3A%3Amiq_server=1&User%3A%3Auser=1&event_id=166634&event_stream_id=166634&event_type=job_create&object_name=Event&vmdb_object_type=event_stream ManageIQ/System]
Line 12:
"S"[----] I, [2020-09-18T13:51:45.088443 #10055:69a5f0] INFO -- : Following Relationship [miqaedb:/System/Event/RequestEvent/Request/request_pending#create]
"F"[----] I, [2020-09-18T13:54:39.110734 #10055:69a5f0] INFO -- : Following Relationship [miqaedb:/System/Event/EmsEvent/ANSIBLETOWER/job_create#create]
Commentary:
Lines 1-7: the two processes have basically identical logs
Line 8: the failed process logs a MiqAeEvent.build_evm_event entry for event=“request_pending”. The successful process has no such entry.
Line 9: the successful process says the user (admin) is member of group “Evm_Group-super_administration” while the failed process says that user (admin) is member of group “Tenant My Company access”. No idea what that might mean.
Line 10-11: the successful process calls EventStream with eventstream type of MiqRequest whereas the failed process calls EventStream with type MiqServer.
Line 12: the successful process procedes to call RequestEvent/Request/request_pending (which takes the process on to the next step in the process) while the failed process instead calls an EmsEvent/ANSIBLETOWER/job_create which is a placeholder we are not using. From this point on the failed process is stuck.
Everything seems to boil down to the unique log on line 8 in the failed process (the “MiqAeEvent.build_evm_event entry” with event=“request_pending”). Researching that method I found a discussion (here)[Openstack Provisioning stucks early with pending state] from several years ago where a team had this same behavior and it had to do with excessive events (several thousand) in the miq_queue. I was excited that this might be my problem, but we currently only have 8 event records in our miq_queue table. 
Thanks for reading this far if you are still with me. Any suggestions would be deeply appreciated!