Picking a NoSQL database for Metrics


#1

Continuing the discussion from Moving metrics out of postgres:

The other discussion was more centered around what may break if we move metrics out of postgres.
Here, I wanted to focus on the choice of the new metrics engine/database.

There are a few reasons for standardizing our metrics collection:

  1. Allow users to add custom metrics of their own to our database.
  2. Allow users to use their existing dashboards that already have adapters for our new databases.
  3. Leverage gui components that have already been written and integrated with our new database.

The first things that come to mind are sensu / statsd -> graphite / logstash. But the horse power requirements are a bit excessive.

I want it to be able to collect data for disk arrays (as @rpo has been giving me valuable edge cases to keep in mind here).

I also want to collect more system centric metrics so people like @akrzos do not need to install a separate monitoring solution because the one we use does not meet their needs/can not be easily extendable. And by extensible, I’d like that to mean grabbing an adapter already written from github with minimal glue.

Elastic search is quite neat but has failed in the past for me once it has grown in size. Graylog has had similar characteristics. I do not have experience with logstash, but the demos are impressive. Graphite is a pain to install on the mac, but was successful in my ~20 node system. The mentioned systems have matured since I’ve used them as well.

@phil_griffith
What type of solution have you used to collect metrics and any things to keep an eye on? Where do you want to fed this data and what adapters already exist to get the data into there?


#2

I’m primarily using Splunk as it’s so flexible and powerful. But it is very
expensive and not everyone needs or can afford such a Rolls Royce solution.
So we’re deploying a dual tier logging/reporting stack. Tier 1 is Splunk
with multiple types of ingestion from its own agent/forwarders to syslog-ng.

Tier 2 will be ELK based. Coupled with that we’ll use MQ type solutions
(RabbitMQ etc) to stream messages and compliment those with
anonymisation/tokenisation based systems where we need to protect/mask user
sensitive data.

We have a predominantly vmware/HP based estate, although have also looked
at Openstack and RHEV as potential drop in replacements for the future.
ManageIQ will provide us with a nice view on the pure virtualised vmware
estate.

I also collect data from our various SAN offerings:

  1. HP iSCSI Left Hand. I have to use a CLI interface to pull out what I
    need, and quite frankly what I get out is pretty awful to use/intepret.
  2. HP EVA block storage. Again CLI based extraction with a lot of parsing
    to get it into a format which is meaningful.
  3. HP 3Par multiple tiered arrays (v400 and 7200/7400 series). These log
    direct into their own System Reporter component which is mySQL based. I use
    the Splunk DB Connect app (free plug-in/adapter) to extract the
    tables/fields I need. For some metrics I have to calculate things like IOPS
    performance by summing various fields.

Ideally I’d like to feed all this data into something like
Elasticsearch/MongoDB etc. As ES is now nicely coupled and developed with
Logtash and Kibana, it’s coming of age. ES now has distributed search and
the imminent Kibana4 release should have much more advanced dashboarding.
[This will keep it in sight with the likes of Splunk]

Any noSQL field/value paired solution will make life a lot easier. I would
certainly be starting with ELK as the primary choice.

Looking into my crystal ball, when we have mainstream SDN/NFV solutions
coupled in and around virtualised hardware infrastructure, and tailored
vApp type applications on top being dynamically orchestrated/managed, then
having a noSQL data model is going to be the only way to go.


#3

@phil_griffiths Do you use collectd/sensu/other to populate realtime system statistics into a system like Graphite? Numbered statistics like cpu usage, network usage rather than log data exceptions.

Sounds like the system you are referencing (point #3) feeds into mySQL. Similar to our current solution.

Thanks for pointers to ELK and Elastic Search. Have enjoyed ES in the past (as opposed to solr)

People sure love Splunk.


#4

@kbrock never used collectd etc unfortunately. Used rrdtool/cacti in the past, which is guess is similar?
Spitting your stats out into something like ES and allowing visualisation via Kibana should be a great choice.
Met some guys from ES today. v1.2 has some funky features like federated and multi dimensional search. Kibana v4 should also make the UI functions much richer and slicker.