Event support for Microsoft System Center


#1

Systems Center Operations Manager (SCOM) monitors both availability (events) and performance (capacity and utilization) for the datacenter and cloud.
After researching how to integrate SCOM events with ManageIQ I have documented 3 different approaches in order of (my) preference. Please chime in with any feedback you have.

Option 1) SCOM Connector
A connector is a custom service or program that enables Operations Manager to communicate bidirectionally with third party systems. See http://msdn.microsoft.com/en-us/library/hh328935.aspx
A ManageIQ connector would be written in Powershell using the Operations Manager Connector Framework (OMCF) SDK. Two Powershell scripts are required.
i. A setup script – to create a connector in SCOM with a subscription specific to ManageIQ. This subscription would have a polling interval setting and a criteria setting.
ii. A second script would be executed within ManageIQ and would read the events gathered by the connector for normalization by ManageIQ.

Pros:

  • Lightweight solution
  • High performance
  • SCOM, not ManageIQ, takes care of the polling and filtering and makes the events available after
    each poll.
  • We can leverage the same connector for importing capacity & utilization data in the near future.

Cons:

  • Requires editing customers’ SCOM application by inserting the connector to ManageIQ. (Maybe this is a potential problem for some customers.)

Option 2) Powershell cmdlet
SCOM exposes events through a simple cmdlet (Get-SCOMAlert)
This could be polled every X minutes from ManageIQ filtering on the event creation time and resolution state.

Pros:

  • Lightweight solution

Cons:

  • A broker to SCOM will have to be developed otherwise a new connection
    will have to be opened and then closed each time the cmdlet runs.
    Note – Right now SCOM cmdlets can be run directly from the VMM server
    but not remotely using WinRM via VMM. If I can get this to work one
    broker can be created to VMM for both ems refreshes and importing
    events.
  • SCOM has no Powershell cmdlet that returns performance data
    so this option cannot be reused for gathering capacity & utilization
    data.
  • Consideration will have to be given to the performance
    implications of filtering each event on resolution state and creation
    time

Option 3) Orchestrator Runbook.
System Center 2012 Orchestrator is a workflow automation tool that allows integration between heterogeneous environments. One use case is the transfer data (such as events) between them.
A run book can be designed in Orchestrator for ManageIQ that queries the SCOM database for events in one runbook activity and exports them into a parsable format for ManageIQ to read in a second activity.

Pros:

  • It will work.

Cons:

  • Customers will have to deploy System Center Orchestrator if they have not already done so.
    In my opinion Orchestrator is better suited to more complex integrations such as a ticketing system (eg Remedy)

#2

FYI here is the prototype code for creating the SCOM Connector, inserting a Subscription to ManageIQ and then a sample script for reading the events gathered by the connector.


#3

I like Option 1. Although Option 2 is “simpler”, Option 1 sets us up for C&U collection. In addition it allows pre-filtering with the Subscription, which you don’t have in Option 2.

Some concerns:

  • Polling

SCOM, not ManageIQ, takes care of the polling and filtering and makes the events available after each poll.

That’s not quite right. We can’t loop on the server side as per your script or it would loop forever and never return. Instead you’d have to loop on the event catcher (ruby) side and make two calls: one to get the events, and then after processing them, a second call to ack the events

  • Data size
    There may be a lot of data that would have to be transferred over the wire. We may have to mitigate that by preprocessing on the server side.

  • Setup must be reentrant
    The Create/Update script pair would have to be a Find-or-Create/Update pair that would run every time the worker starts. This would allow changes to the subscription “on-the-fly”. Additionally, if a worker just dies, it needs to come back up and not fail on things like the connector already being there.

  • Verify connection performance.
    On the EMS Refresh side, the performance on the connection seemed awful. Since this would have to make two calls per iteration that might be very expensive.


#4

Jason,

One of the beauties of Option 1 was that a connection to SCOM is made once and a “short list” of pre-filtered events is read periodically by ManageIQ, as illustrated in the prototype. Since we cannot leverage that in ManageIQ we can investigate a couple of alternatives:
i) I might have removed the delay in connecting to the WinRM service but I need to verify through conclusive testing.
ii) If this delay is unavoidable maybe we can implement a broker. A broker might have to be developed for the EMS Refresh, it depends on i) above.

Setup must be reentrant.
Of course. This exceeded the scope of the prototype.