Sending alerts via snmp traps not working


#1

I’m trying to send alerts using snmp traps to a another monitoring tool (check_mk), after configuring the alert to send snmp trap and add it to a profile (in my case it’s monitoring Openshift), I can see the alert event in /miq_policy/log , but the snmp traps doesn’t arrive to my server.
I’m monitoring the cloudforms server with tcpdump and no snmp trap is going out.


#2

I’m not sure if this is available without a Red Hat login, but here is an old-but-useful document describing how to setup SNMP traps with the old ManageIQ EVM 4.x product. I think most of it is still applicable now.

pemcg


#3

Many thanks! I’ve followed the pdf (I had it), but I don’t get any messages.
I’m trying also with the email notifications, and it works for the test of email, but I cannot get any alert notifications, maybe I don’t get how to configure an alert for Openshift


#4

Hi.

  • Which manageiq version are you running?
  • Which openshift entity you’re trying to alert on, and on what condition? [Note confusing “Host / Node” which is UI name for hosts in virtual machines world vs “Node” which is container node :frowning: ]

The bad news is we never made sure Alerts defined in UI work for containers. If they do it’s accidental :flushed:. Anyway if UI allows their creation and they don’t work, that’s a bug!

Also keep in mind that ManageIQ alerts are based on data from inventory refresh, and current container refresh speed might be too slow for monitoring needs… Do you have an “acceptable time from problem to alert”?

We’ve been working on defining and computing alerts in Hawkular (and now working on similar in Prometheus), and making ManageIQ pull those. This approach will scale better (and trigger much faster), but is WIP, isn’t yet documented or quite ready for consumption…

cc @moolitayer


#5

Sorry, I’m told this not precise, ignore this part.


#6

Hi,
I’m using the redhat version, cloudforms 4.5.
I’m testing the alerts with the “Container Operation: Replicator Successfully Created Pod”, the condition example i used is:

Provider.Management Events : Container Name CONTAINS “example”

when i do a new schedule for a container named switcher-example, I don t get any notification.

I’m also testing a policy wit the condition “Pod scheduled”, testing it with email, snmp and log message and I get nothing.


#7

Hey @jgpelaez, did you get this going?

Thanks


#8

Hi @pemcg and @cben

Do you have any further guidance on this? I am also looking at using the Openshift provider integration in CF to send snmp traps to external snmp trap receiver for certain events in OCP. The document helps, but I would expect CF needs to provide a MIB to the external snmp receiver?

Thanks


#9

Perhaps trying to get a basic example going would be easiest. Its CF 4.5 and OCP 3.5. I want CF to send a snmp alert if a ocp node rebooted.

Thanks


#10

Hi

A couple of things: the “Based on Host/Node” alert is intended for infrastructure hosts (such as ESXi) or OpenStack Nova nodes, and not OpenShift container nodes (which are a different object type). There’s a separate “'Based on Node” alert that is intended to handle OpenShift node alerts from Prometheus (CloudForms 4.6).

The OID base in the original PDF doc is 1.3.6.1.4.1.33482 which no longer exists (The Private Enterprise Number 33482 was owned by the original ManageIQ company). We should probably use 1.3.6.1.4.1.2312 (RedHat Software) now, I’ve created a BZ for this to be changed: https://bugzilla.redhat.com/show_bug.cgi?id=1551519.

That said, I’d still expect traps to be sent, even using the incorrect OID. You may find that there’s not much useful information available in a trap from a control alert (such as actual node name). You may get better results calling your own ‘upstream’ method from the instance that handles the event itself (such as /System/Event/EmsEvent/Kubernetes/NODE_NOTREADY). At this point earlier in the event handling flow you have access to the event stream object, so can extract more useful information, i.e.

cluster    = $evm.vmdb(:ems, $evm.root['event_stream'].ems_id).name
project    = $evm.root['event_stream'].container_namespace || "N/A"
pod        = $evm.root['event_stream'].container_group_name || "N/A"
container  = $evm.root['event_stream'].container_name || "N/A"
event_type = $evm.root['event_stream'].event_type
message    = $evm.root['event_stream'].message

You can then forward this to a monitoring platform or email using something like:

to      = $evm.object['to_email_address']
from    = $evm.object['from_email_address']
subject = "#{event_type} event received from cluster #{cluster}"
body    = "A #{event_type} event was received from cluster #{cluster}<br><br>"
body    += "Project: #{project}<br>"
body    += "Pod: #{pod}<br>"
body    += "Container: #{container}<br>"
body    += "Message: #{message}"
$evm.execute('send_email', to, from, subject, body)

There’s also a $evm.execute('snmp_trap_v2', inputs) if you want to send a trap at this stage.

Hope this helps,
pemcg


#11

Hey Pete, thanks that is very useful. The requirement is to use snmp trap too, this will allow integration to exsisting alerting and escalation system. Still wondering how MIBs fit into the approach. Usually the snmp trap receiver would require a MIB (from CF) to translate the trap…


#12

There doesn’t seem to be a MIB for ManageIQ. The BZ that I created suggests that one should be created, and references a BZ for Ovirt that provided a similar enhancement for that project.


#13

Yes thanks noticed the MIB requirement and info in your BZ. Thanks again, all very useful!


#14

Hello, I’ve got gaprindashvili-2.20180313094615_ with a minishift container provider setup. I am trying to see what is achievable via CF4.6 and Alerts for OCP providers. Ideally I want to use Control > Explorer > Alerts and then send emails. Should this work? The kicker is that we dont want to use custom code (which works fine) to send the emails…

Thanks


#15

@dan AFAIK this will not work. Alerts with any container-related Driving Event are currently unimplemented (although the UI allows you to define them :frowning: ): https://bugzilla.redhat.com/1494599

I think sending email is also available as a Policy action?
Hmm, seems you need to first create a custom email action: http://manageiq.org/docs/reference/euwe/doc-Policies_and_Profiles_Guide/miq/#creating-an-e-mail-action
So you could try defining a Pod Control Policy, assign Pod Container Unhealthy event, and assign the Send Email action. (Plus the usual ceremony to let the policy run — assign it to a Policy Profile, assign that profile to the provider…)

  • You should also be aware of https://bugzilla.redhat.com/1367114 — openshift policies are skipped if the event arrives before ManageIQ has seen the target (pod in this case) in inventory refresh. I hope for Container Unhealthy this won’t be much of a problem, as this event is not immediate after pod creation, and refresh in gaprindashvili / 4.6 is much faster.

Note that policies don’t have rate limiting. If that works, it should send a mail every time the event arrives from openshift — might be very noisy! Openshift tends to repeat events like “failed”, “unhealthy” etc at high frequency…
You can run oc get event --watch --all-namespaces to get a feeling what’s spam and what’s useful.

I must admit that even after working on Policies/Alerts, I have little idea what’s the intended division of labor between them — when do you want a Policy, when do you want an Alert, what’s the difference in cases where both would work (which is not this case)…


#16

hey @cben, thanks for the info.

I am not sure if the policy based approach is working. I am not expecting an email, since the miq docker doesnt have a working smtp relay, but was hoping to find the attempted email send and/or log entries for the policy “trigger” somewhere?

This log entry should have done something…?
[----] I, [2018-03-15T04:35:35.097424 #276:11a3140] INFO – : MIQ(MiqEventHandler::Runner#get_message_via_drb) Message id: [14013], MiqWorker id: [6], Zone: [default], Role: [event], Server: [], Ident: [ems], Target id: [1], Instance id: [], Task id: [], Command: [EmsEvent.add], Timeout: [600], Priority: [100], State: [dequeue], Deliver On: [], Data: [], Args: [{:event_type=>“POD_SCHEDULED”, :source=>“KUBERNETES”, :timestamp=>“2018-03-15T04:35:23Z”, :message=>“Successfully assigned heapster-xv7kx to localhost”, :container_node_name=>nil, :container_group_name=>“heapster-xv7kx”, :container_replicator_name=>nil, :container_namespace=>“openshift-infra”, :container_name=>nil, :full_data=>{:timestamp=>“2018-03-15T04:35:23Z”, :kind=>“Pod”, :name=>“heapster-xv7kx”, :namespace=>“openshift-infra”, :reason=>“Scheduled”, :message=>“Successfully assigned heapster-xv7kx to localhost”, :uid=>“46666754-280a-11e8-b776-d253dbde9d6c”, :container_group_name=>“heapster-xv7kx”, :container_namespace=>“openshift-infra”, :event_type=>“POD_SCHEDULED”}, :ems_id=>1, :container_group_ems_ref=>“46666754-280a-11e8-b776-d253dbde9d6c”}], Dequeued in: [3.73873549] seconds

Thanks


#17

Some progress… had to assign the policy to a pod to get the below.

[----] I, [2018-03-15T05:13:18.102257 #30841:11a3140] INFO – : MIQ(GenericMailer.deliver) starting: method: policy_action_email options: {:to=>"spam@gmail.com", :from=>"cfadmin@cfserver.com", :subject=>“Policy Succeeded: Pod Unhealthy, for (MANAGEIQ::PROVIDERS::KUBERNETES::CONTAINERMANAGER::CONTAINERGROUP) heapster-0d26w”, :miq_action_hash=>{:header=>“Policy Succeeded”, :policy_detail=>“Policy ‘Pod Unhealthy’, Succeeded”, :event_description=>“Pod Container Killing”, :entity_type=>“ManageIQ::Providers::Kubernetes::ContainerManager::ContainerGroup”, :entity_name=>“heapster-0d26w”}}
[----] E, [2018-03-15T05:13:20.046931 #30841:11a3140] ERROR – : MIQ(GenericMailer.deliver) method: policy_action_email delivery-error: Connection refused - connect(2) for “127.0.0.1” port 25 attempting to resend

I did the same with the heapster replicator, but no policy event :frowning:

So would objects always have to be assigned to the policy? To get the email about the event?


#18

A ManageIQ/CloudForms appliance or pod doesn’t come with a built-in SMTP relay, you’d need to define an external SMTP relay in your “Outgoing SMTP E-mail Server” setting.

pemcg


#19

Yep thanks. Thats expected for this dev env and ok. I dont actually have to receive the email, just want to see when and how to send them.

At this stage it looks like custom code is the only way for any version of miq. Setting up some form of “gui” based alert for ocp provider events is just not there?

What is the roadmap for this? Need prometheus integration with ocp? I see the latest docker miq has Alert > Prometheus for the ocp provider…

Thanks


#20

Glad you’re making progress.
The log file you want is policy.log (last 1000 lines can be seen under Control -> Log).

Sorry, that’s the catch I alluded to as “the usual ceremony” but should have been more specific.
What can be assigned varies by target type:
http://manageiq.org/docs/reference/gaprindashvili/doc-Policies_and_Profiles_Guide/miq/index.html#profile-assign
For container nodes/pods/replicators, there are just 2 options: