Capablanca not starting


#1

Hello,

I just downloaded the .ova, imported it into oVirt 3.6.5, created a VM and started it.
After having setup the basics (fixed network setup, date+time, hostname, timezone, reboot…), the /var/www/miq/vmdb/log/evm.log is showing that some things are running, then crashing with things like :

MIQ(MiqMemcached::Control.stop) stopped memcached, result: Redirecting to /bin/systemctl stop memcached.service
/var/www/miq/vmdb/gems/pending/util/runcmd.rb:8:in `runcmd’: memcached: aucun processus trouvé (RuntimeError)

As I saw Capablanca RC1 - RHEV - Not starting on boot, I had a look at the Gemfile file, but it was OK. I ran the bundle install and rebooted, but the same crash appears.
I have no http nor https port open and I’m surprised because I did not change anything special from the .ova…

What can I check?


#2

Weird. @jrafanie @carbonin Thoughts here?


#3

As a start can we see the output from systemctl status memcached and systemctl status evmserverd?

If either of those have failed, the output from journalctl -u memcached or journalctl -u evmserverd would probably be a good place to start.

Thanks


#4

[root@manageiq2 ~]# systemctl status memcached
memcached.service - Memcached
Loaded: loaded (/usr/lib/systemd/system/memcached.service; enabled)
Active: inactive (dead) since dim. 2015-12-20 19:40:42 CET; 1min 8s ago
Process: 1559 ExecStart=/usr/bin/memcached -u $USER -p $PORT -m $CACHESIZE -c $MAXCONN $OPTIONS (code=killed, signal=TERM)
Main PID: 1559 (code=killed, signal=TERM)
CGroup: /system.slice/memcached.service

déc. 20 19:39:43 manageiq2.sdis.isere.fr systemd[1]: Started Memcached.
déc. 20 19:40:42 manageiq2.sdis.isere.fr systemd[1]: Stopping Memcached…
déc. 20 19:40:42 manageiq2.sdis.isere.fr systemd[1]: Stopped Memcached.
[root@manageiq2 ~]# systemctl status evmserverd
evmserverd.service - EVM server daemon
Loaded: loaded (/usr/lib/systemd/system/evmserverd.service; enabled)
Active: deactivating (stop) (Result: exit-code) since dim. 2015-12-20 19:40:42 CET; 1min 27s ago
Process: 860 ExecStart=/bin/sh -c /bin/evmserver.sh start (code=exited, status=0/SUCCESS)
Main PID: 2815 (code=exited, status=1/FAILURE); : 2959 (evmserver.sh)
CGroup: /system.slice/evmserverd.service
ââcontrol
ââ2959 /bin/bash /bin/evmserver.sh stop
ââ2961 /var/www/miq/vmdb/script/rake

déc. 20 19:40:12 manageiq2.sdis.isere.fr sh[860]: Starting EVM…
déc. 20 19:40:12 manageiq2.sdis.isere.fr sh[860]: Running EVM in background…
déc. 20 19:40:12 manageiq2.sdis.isere.fr systemd[1]: Started EVM server daemon.
déc. 20 19:40:34 manageiq2.sdis.isere.fr crontab[2930]: (root) REPLACE (root)
déc. 20 19:40:42 manageiq2.sdis.isere.fr systemd[1]: evmserverd.service: main process exited, code=exited, status=1/FAILURE
déc. 20 19:40:49 manageiq2.sdis.isere.fr sh[2959]: Stopping EVM gracefully…

journalctl -u memcached

déc. 18 16:49:07 manageiq2.sdis.isere.fr systemd[1]: Stopped Memcached.
déc. 18 16:51:10 manageiq2.sdis.isere.fr systemd[1]: Stopped Memcached.
– Reboot –
déc. 20 19:39:43 manageiq2.sdis.isere.fr systemd[1]: Starting Memcached…
déc. 20 19:39:43 manageiq2.sdis.isere.fr systemd[1]: Started Memcached.
déc. 20 19:40:42 manageiq2.sdis.isere.fr systemd[1]: Stopping Memcached…
déc. 20 19:40:42 manageiq2.sdis.isere.fr systemd[1]: Stopped Memcached.
déc. 20 19:42:41 manageiq2.sdis.isere.fr systemd[1]: Stopped Memcached.

déc. 18 16:52:42 manageiq2.sdis.isere.fr systemd[1]: Stopping EVM server daemon…
déc. 18 16:52:42 manageiq2.sdis.isere.fr systemd[1]: Stopped EVM server daemon.
– Reboot –
déc. 20 19:39:37 manageiq2.sdis.isere.fr systemd[1]: Starting EVM server daemon…
déc. 20 19:40:12 manageiq2.sdis.isere.fr sh[860]: Starting EVM…
déc. 20 19:40:12 manageiq2.sdis.isere.fr sh[860]: Running EVM in background…
déc. 20 19:40:12 manageiq2.sdis.isere.fr systemd[1]: Started EVM server daemon.
déc. 20 19:40:34 manageiq2.sdis.isere.fr crontab[2930]: (root) REPLACE (root)
déc. 20 19:40:42 manageiq2.sdis.isere.fr systemd[1]: evmserverd.service: main process exited, code=exited, status=1/FAILURE
déc. 20 19:40:49 manageiq2.sdis.isere.fr sh[2959]: Stopping EVM gracefully…
déc. 20 19:42:12 manageiq2.sdis.isere.fr systemd[1]: evmserverd.service stopping timed out. Terminating.
déc. 20 19:42:12 manageiq2.sdis.isere.fr systemd[1]: Unit evmserverd.service entered failed state.
déc. 20 19:42:13 manageiq2.sdis.isere.fr systemd[1]: evmserverd.service holdoff time over, scheduling restart.
déc. 20 19:42:13 manageiq2.sdis.isere.fr systemd[1]: Stopping EVM server daemon…
déc. 20 19:42:13 manageiq2.sdis.isere.fr systemd[1]: Starting EVM server daemon…
déc. 20 19:42:19 manageiq2.sdis.isere.fr sh[3002]: Starting EVM…
déc. 20 19:42:19 manageiq2.sdis.isere.fr sh[3002]: Running EVM in background…
déc. 20 19:42:19 manageiq2.sdis.isere.fr systemd[1]: Started EVM server daemon.
déc. 20 19:42:35 manageiq2.sdis.isere.fr crontab[3125]: (root) LIST (root)
déc. 20 19:42:41 manageiq2.sdis.isere.fr systemd[1]: evmserverd.service: main process exited, code=exited, status=1/FAILURE
déc. 20 19:42:48 manageiq2.sdis.isere.fr sh[3148]: Stopping EVM gracefully…
déc. 20 19:44:11 manageiq2.sdis.isere.fr systemd[1]: evmserverd.service stopping timed out. Terminating.
déc. 20 19:44:11 manageiq2.sdis.isere.fr systemd[1]: Unit evmserverd.service entered failed state.
déc. 20 19:44:11 manageiq2.sdis.isere.fr systemd[1]: evmserverd.service holdoff time over, scheduling restart.
déc. 20 19:44:11 manageiq2.sdis.isere.fr systemd[1]: Stopping EVM server daemon…
déc. 20 19:44:11 manageiq2.sdis.isere.fr systemd[1]: Starting EVM server daemon…
déc. 20 19:44:19 manageiq2.sdis.isere.fr sh[3166]: Starting EVM…
déc. 20 19:44:19 manageiq2.sdis.isere.fr sh[3166]: Running EVM in background…
déc. 20 19:44:19 manageiq2.sdis.isere.fr systemd[1]: Started EVM server daemon.
déc. 20 19:44:34 manageiq2.sdis.isere.fr crontab[3287]: (root) REPLACE (root)
déc. 20 19:44:34 manageiq2.sdis.isere.fr crontab[3292]: (root) LIST (root)


#5

@neca I was able to deploy the ovirt ova successfully. Can you detail the exact steps you went through after deploy? Also, can you provide a larger excerpt from /var/www/miq/vmdb/log/evm.log from the time you deployed the appliance and also after running systemctl restart evmserverd?

Thanks


#6

Here are the things I may have done that could be noticed :

  • When creating the VM from the template in oVirt, I unchecked the Cloud init option, thinking it had nothing to prepare. Was I wrong?
  • The previous ManageIQ VM was in past-botvinik version, and was booting in DHCP, with a static reserved ip, working fine. I wiped the VM, but kept the MAC of its NIC, so the new capablanca VM would keep the same name and IP. It all worked well, but something strange occured : the capablanca VM registered in our dyndns the reverse record localhost -> 192.168.xx.yy (instead of… doing nothing and leaving 127.0.0.1 alone). It crashed our DNS, and took us many minutes to figure it out. Had it had a bad side effect on the capablanc VM? I don’t know.
    Anyway, the capablanca VM is now setup in pure static, NO dhcp setup.

That’s all.

I wiped the evm.log, rebooted it, waited some time, and here is the complete log : http://www.ecarnot.net/tmp/evm.log

I’m OK to wipe it all and restart, but I guess there is some lesson to learn from this setup failure, so I’ll wait for @Fryguy and/or @jrafanie to decide.


#7

Ah, I think I’ve found the issue.

At /var/www/miq/vmdb/lib/miq_memcached.rb +121 we try to killall -9 memcached, but that command returns an error if there is no process called memcached. We try to deal with that by only raising the error if the message doesn’t match the string “memcached: no process”, but because your error seems to be in French we are not catching it and are crashing.

The code in question is:

    def self.killall
      MiqUtil.runcmd("killall -9 memcached")
    rescue => err
      raise unless err.to_s =~ /memcached: no process/
    end

@Fryguy, we could add the -q flag to killall here. From the man page:

-q, --quiet
              Do not complain if no processes were killed.

@neca would you be willing to give that a try? Should be as easy as changing MiqUtil.runcmd("killall -9 memcached") to MiqUtil.runcmd("killall -q -9 memcached")


#8

Hi @carbonin,

Indeed, amongst the things I changed, I forgot to say I changed the locale.
I changed it in /etc/locale.conf : LANG=fr_FR.UTF-8, and I ran :
localectl set-keymap fr

I tried to add the “-q” param you suggested. Though I understand the logic, neither a restart of both memcached+evmserver, neither a complete reboot did change anything. The log are still showing that after the seeding is complete, memcached gets stopped (as seen in the provided log file).

I am now trying to reset the locale params to some more northern american and keep you aware.


#9

@neca Thanks for the update. Yeah, it looks like we would still fail as the -q option quiets the output, but does not cause killall to return a success status when no processes could be killed. In that case we would still be raising the error and wouldn’t catch it even if the error was in English because the regex wouldn’t match. So it seems that running without the change and changing the locale back should fix it.


#10

@carbonin I changed back every locale I could have changed, rebooted, but to no avail.
The log are still cycling the way they were in the provided log file.

I just completely wiped the VM and re-deployed it, just to see it working once, before RE-changing the locales and debug.

Stay tuned.


#11

@neca I was able to reproduce your issue by deploying a new capablanca appliance and running localectl set-locale LANG=fr_FR.UTF-8 && systemctl restart evmserverd

Then the problem resolved itself after running systemctl stop evmserverd && localectl set-locale LANG=en_US.UTF-8 && systemctl start evmserverd

I also opened an issue here (https://github.com/ManageIQ/manageiq/issues/6021) to track this.


#12

Aaaah… always troubles with the french :wink:
So glad you manage to reproduce this issue, and thanks for taking time to open the bug.
More seriously, I guess this could happen with some other non-US language, and maybe related to filters or hard-coded greps somewhere.

Merci beaucoup.

BTW : Once re-deployed, and LANG untouched, all is working fine.