How to kill a Q-task_id stuck in infinite loop?


#1

We have launch a service which calls custom buggy ruby code containing infinite loop (we can see it in the log since it logs things). We have deleted the request id in the manageiq interface but the Q-task-id still running: how to stop it?


#2

@gquentin have you tried to kill the process that is still running.


#3

On your ManageIQ appliance, log in to the command line, then:

vmdb
bin/rake evm:status

Checking EVM status...
 Zone    | Server Name  | Status  |            ID |  PID | SPID | URL                     | Started On           | Last Heartbeat       | Active Roles
---------+--------------+---------+---------------+------+------+-------------------------+----------------------+----------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 default | manageiq003  | started | 1000000000001 | 2970 | 2991 | druby://127.0.0.1:40255 | 2016-02-13T09:37:58Z | 2016-02-17T16:46:08Z | automate:database_operations:database_owner:ems_inventory:ems_metrics_collector:ems_metrics_coordinator:ems_metrics_processor:ems_operations:event:notifier:reporting:scheduler:smartproxy:smartstate:user_interface:web_services

 Worker Type                                                       | Status  |            ID |   PID | SPID  | Queue Name / URL      | Started On           | Last Heartbeat
-------------------------------------------------------------------+---------+---------------+-------+-------+-----------------------+----------------------+----------------------
 ManageIQ::Providers::Redhat::InfraManager::EventCatcher           | started | 1000000002786 |  3170 | 3261  | ems_1000000000001     | 2016-02-13T09:38:39Z | 2016-02-17T16:45:24Z
 ManageIQ::Providers::Redhat::InfraManager::MetricsCollectorWorker | started | 1000000002782 |  3156 | 3268  | redhat                | 2016-02-13T09:38:48Z | 2016-02-17T16:46:10Z
 ManageIQ::Providers::Redhat::InfraManager::MetricsCollectorWorker | started | 1000000002781 |  3153 | 3267  | redhat                | 2016-02-13T09:38:47Z | 2016-02-17T16:45:58Z
 ManageIQ::Providers::Redhat::InfraManager::RefreshWorker          | started | 1000000002952 |  4154 | 4161  | ems_1000000000001     | 2016-02-17T16:10:40Z | 2016-02-17T16:46:09Z
 MiqEmsMetricsProcessorWorker                                      | started | 1000000002783 |  3160 | 3298  | ems_metrics_processor | 2016-02-13T09:39:25Z | 2016-02-17T16:45:51Z
 MiqEmsMetricsProcessorWorker                                      | started | 1000000002784 |  3163 | 3296  | ems_metrics_processor | 2016-02-13T09:39:22Z | 2016-02-17T16:45:58Z
 MiqEventHandler                                                   | started | 1000000002787 |  3173 | 3300  | ems                   | 2016-02-13T09:39:24Z | 2016-02-17T16:46:16Z
 MiqGenericWorker                                                  | started | 1000000002789 |  3180 | 3311  | generic               | 2016-02-13T09:39:37Z | 2016-02-17T16:46:17Z
 MiqGenericWorker                                                  | started | 1000000002828 | 16277 | 16287 | generic               | 2016-02-14T04:11:17Z | 2016-02-17T16:46:18Z
 MiqPriorityWorker                                                 | started | 1000000002790 |  3185 | 3272  | generic               | 2016-02-13T09:38:52Z | 2016-02-17T16:46:18Z
 MiqPriorityWorker                                                 | started | 1000000002791 |  3188 | 3273  | generic               | 2016-02-13T09:38:53Z | 2016-02-17T16:46:18Z
 MiqReportingWorker                                                | started | 1000000002793 |  3194 | 3306  | reporting             | 2016-02-13T09:39:27Z | 2016-02-17T16:46:19Z
 MiqReportingWorker                                                | started | 1000000002792 |  3191 | 3305  | reporting             | 2016-02-13T09:39:27Z | 2016-02-17T16:46:18Z
 MiqScheduleWorker                                                 | started | 1000000002794 |  3198 | 3282  |                       | 2016-02-13T09:39:03Z | 2016-02-17T16:46:00Z
 MiqSmartProxyWorker                                               | started | 1000000002954 |  4225 | 4241  | smartproxy            | 2016-02-17T16:11:31Z | 2016-02-17T16:46:16Z
 MiqSmartProxyWorker                                               | started | 1000000002953 |  4222 | 4240  | smartproxy            | 2016-02-17T16:11:31Z | 2016-02-17T16:46:16Z
 MiqUiWorker                                                       | started | 1000000002797 |  3209 |       | http://127.0.0.1:3000 | 2016-02-13T09:39:03Z | 2016-02-17T16:45:53Z
 MiqWebServiceWorker                                               | started | 1000000002798 |  3252 |       | http://127.0.0.1:4000 | 2016-02-13T09:39:02Z | 2016-02-17T16:45:38Z

Get the PIDs of the MiqGenericWorker and MiqPriorityWorker workers (in my case 3180,16277,3185,3188)

You can watch the automation engine running in a worker and fire off ruby processes for each automation instance that it runs:

watch -n 1 “ps -ef --forest|grep ‘3180|16277|3185|3188’|grep -v ‘watch|grep’”

Every 1.0s: ps -ef --forest|grep '3180\|16277\|3185\|3188'|grep -v 'watch\|grep'  

root	  3180  2970  0 Feb13 ?        00:27:44  \_ /var/www/miq/vmdb/lib/workers/bin/worker.rb
root     12091  3180 25 17:00 ?        00:00:00  |   \_ /opt/rh/rh-ruby22/root/usr/bin/ruby  <-- automation method running
root	  3185  2970  0 Feb13 ?        00:23:38  \_ /var/www/miq/vmdb/lib/workers/bin/worker.rb
root	  3188  2970  0 Feb13 ?        00:23:23  \_ /var/www/miq/vmdb/lib/workers/bin/worker.rb
root     16277  2970  0 Feb14 ?        00:23:42  \_ /var/www/miq/vmdb/lib/workers/bin/worker.rb

You should see your looping process as a child of one of the workers. You can kill this PID.

Hope this helps,
pemcg


#4

Thanks.


#5

i have the same problem but in simulation: i don’t see any child process. So how to stop it?