ManageIQ appliance booting but no httpd (so nothing's working)


#1

Hello,

After having importing an appliance and working with it since half-december, a recent reboot of this VM show that the boot process is OK, but no httpd process is running.

/var/www/miq/vmdb/log/evm.log is showing some errors for which I can’t say they’re minor (EvmDatabase.seed Class OntapAggregateDerivedMetric does not exist, and others…)
(ERROR – : [ActiveRecord::RecordInvalid]: Validation failed: Description has already been…)

The only “special” thing I changed, but it worked OK, was to authenticate against an active directory.

What should I provide or check to debug this behaviour?

Regards,


#2

@neca Did you update your code by git pull or replace an existing appliance with a newer one but kept the database? I’m not sure why OntapAggregateDerivedMetric does not exist or why you’re getting that validation error.


#3

Actually, after having seen this issue, I tried to update the code via git, but a git pull returned nothing new.
I also tried to do a yum upgrade, and it upgraded packages not directly related to manageIQ (though important like kernel, but as I said, the complete OS is booting with NO issue, network is OK, FS are not full).

Could it help if I post the complete log on pastebin or similar?


#4

Here is the log, cropped at the relevant times :
http://www.ecarnot.net/tmp/evm.log


#5

@neca It’s not clear from the log why but TimeProfile.seed is failing because another row in time_profiles table has the same description as one it’s trying to seed.

It’s failing here:


#6

Wow, I didn’t know I was THAT skilled I could damage an app just by doing basic things :

  • set the authentication against AD
  • set the root pwd
  • set the date and time, using the dedicated manageIQ TUI console

What next? Should I delete some specific row in some table, and hope it won’t show up again?


#7

I know nothing about databases not manageIQ database seeding, but I tried to remove the only row in the timeprofile table, then rebooted.
I did that because :

  • It’s a very fresh install, and I have nothing to lose
  • I find that ManageIQ is a wonderful product that I want and need to manage our datacenters (oVirt and RHEV)
  • I want this issue to get solved.

Anyway, the error related to the timeprofile did not show up anymore, but the previous ones about seeding did (same as in the provided log).
As I have nothing to lose, is it possible to completely wipe the database and re-seed it? (and how?)


#8

@neca I’m not familiar with the OntapAggregateDerivedMetric and related OnTap* classes, are these the errors that still exist?

ERROR – : EvmDatabase.seed Class OntapAggregateDerivedMetric does not exist
ERROR – : EvmDatabase.seed Class OntapDiskDerivedMetric does not exist

I don’t believe much of the UI or codebase uses those classes so I don’t know that it would affect your use of ManageIQ.

@rpo have you seen the errors found here: http://www.ecarnot.net/tmp/evm.log

I would venture to guess we can’t “OntapAggregateDerivedMetric”.constantize because the mixin’s init method is blowing up. https://github.com/manageiq/manageiq/blob/e2803088ee116f1c41b46161b9925b71e7677280/vmdb/app/models/mixins/ontap_derived_metric_mixin.rb#L30


#9

@jrafanie These errors appear at every boot.
This log file then stay still, as nothing is running.

As I explained, I have nothing to lose, so I’m more than willing to try any command you may tell me to test, or database to wipe, or things to try.

Is re-seeding the database :

  • a good idea?
  • helpful?
  • possible via command line?

#10

Well, @neca, it’s not entirely clear what’s happening.

Re-seeding the database from scratch might be the only thing you can do without debugging the specific errors.

Normal server startup will try to seed the database and that’s what is reporting the errors you’re seeing in regards to the OnTap* classes.

To completely clear out the database (caution, you’ll have to re-enter all information again such as providers, credentials, etc. and all history would be lost), type vmdb to cd to the vmdb directory, then bin/rake evm:db:reset. When that completes, you’ll need to start the server processes via service evmserverd start or just reboot the vm.


#11

OK Joe, here’s what I tried, with a reboot, patience and an eye on the evm.log, at each step, separately :

  • yum -y upgrade everything : not better
  • as I saw an error related to UTF-8 conversion, and as I may have entered french characters somewhere, I tried to set the /etc/sysconfig/i18n locale to fr_FR@UTF8
  • disable iptables
  • reset the DB according to your last post

The last try (reset the DB) was the one that may have thrown the less errors in the log. But as many previous attempts, this lead to a stop around a memcached problem.

So, as I am still very motivated to help, I installed another ManageIQ in a different datacenter, a different site, and once running, I took the time to diff-compare the content of /var/www/miq, but I saw nothing obviously different, apart what may be called normal (logs, .git, cache…). I guess the relevant data are stored in database.

Anyway, the second ManageIQ instance is running OK, and the first one, with a fresh reset DB, is failing with the error below.
So now, the difference lies in the fact that I yum upgraded the first instance.

The logs are :
[----] I, [2015-01-19T15:50:05.505697 #2070:a4b834] INFO – : EvmDatabase.seed Seeding… Complete
[----] I, [2015-01-19T15:50:05.754480 #2070:a4b834] INFO – : MIQ(MiqMemcached:Control.stop) stopped memcached, result: Arrêt de memcached : ^[[60G[^[[0;32m OK ^[[0;39m]
[----] I, [2015-01-19T15:50:05.879126 #2070:a4b834] INFO – : MIQ(MiqQueue.put) Message id: [4], id: [], Zone: [default], Role: [], Server: [55b41930-8906-11e4-b9cb-001a4abbb9a8], Ident: [miq_server], Target id: [], Instance id: [1], Task id: [], Command: [MiqServer.shutdown_and_exit], Timeout: [600], Priority: [100], State: [ready], Deliver On: [], Data: [], Args: []
/var/www/miq/lib/util/runcmd.rb:9:in runcmd': memcached: aucun processus tué (RuntimeError) from /var/www/miq/vmdb/lib/miq_memcached.rb:121:inkillall’
from /var/www/miq/vmdb/lib/miq_memcached.rb:106:in stop!' from /var/www/miq/vmdb/lib/miq_memcached.rb:115:inrestart!‘
from /var/www/miq/vmdb/app/models/miq_server/environment_management.rb:82:in start_memcached' from /var/www/miq/vmdb/app/models/miq_server.rb:274:instart’
from /var/www/miq/vmdb/lib/workers/evm_server.rb:71:in start' from /var/www/miq/vmdb/lib/workers/evm_server.rb:85:instart’
from /var/www/miq/vmdb/lib/workers/evm_server.rb:89:in <top (required)>' from /opt/rh/ruby193/root/usr/local/share/gems/bundler/gems/rails-f9749c2ef83b/railties/lib/rails/commands/runner.rb:52:ineval’
from /opt/rh/ruby193/root/usr/local/share/gems/bundler/gems/rails-f9749c2ef83b/railties/lib/rails/commands/runner.rb:52:in <top (required)>' from /opt/rh/ruby193/root/usr/local/share/gems/bundler/gems/rails-f9749c2ef83b/railties/lib/rails/commands.rb:64:inrequire’
from /opt/rh/ruby193/root/usr/local/share/gems/bundler/gems/rails-f9749c2ef83b/railties/lib/rails/commands.rb:64:in <top (required)>' from script/rails:6:inrequire’
from script/rails:6:in `’

Just to save your time (and mine) : are you willing to try to help me help you debug and point out what is failing (and in that case, I’m OK), are do you consider the yum upgrade may have damaged my server enough and debug here is not relevant, so I’ll format and re-install (and in that case, I’m also OK)?


#12

So, @neca I would probably start clean with the website appliance and if it occurs again, try to figure out a recipe to recreate the issue. It’s hard to know what exactly is going on if we can’t eliminate yum updates, manageiq git repo changes, and user entered data as possible suspects.


#13

Well, I removed the appliance and re-created a new one, and yes, everything is working OK.
I did not make any change : no active directory link, no yum upgrade, no git pull.