Provider Driver Architecture


#1

We should change the architecture of ManageIQ to make adding new Providers easy. Preferably, it should be as easy as adding a new ActiveRecord model derived from ExtManagementSystem and a new directory to do the common communication to this Provider. Once that is done, then I would expect that Inventory (EmsRefresh), Event Catching, Metrics, and Operations (including Provisioning) just work.

I expect the UI to seamlessly work as well. This may mean that we move much of the logic from Controllers/Views into Model methods.

Let’s brainstorm about this here.


Documentation on how to add new providers
#2

@blomquisg and I have been discussing this for a long time since the last big meetup. We have a proposal in the works, and we can use that as the start of a discussion.


#3

Talking about architecture, is there any plan to integrate technologies such BPM and BRMS. Indeed, when looking at CloudForms model, it looks a lot like a Complex Event Processing (CEP/BRMS) engine that launches workflows (BPM) based on rules (BRMS).

Components could be decoupled in the same way as OpenStack components : a database(s) to store long term data (history, performance data, etc…) and message queue to interconnect components. Each component could evolve independantly, making it easier to contribute to a specific component.

My 2 cents…


#4

@chessbyte, myself and others have talked about this in the past, and many others have expressed interest. I think that might be a something we’d want to do if we can understand how it fits into the whole ManageIQ experience.

This particular thread though is about rearranging and refactoring the code around the existing virtualization providers (our bread and butter, at the moment) so that it can be made a lot easier for new providers to be added. A BRMS would be a completely new kind of thing.


#5

Well, my comment pointed two different (yet somewhat connected) topics. The first was about BPM/BRMS and I agree that it is off topic. Do you think it deserves its own Development/Blueprints topic ?

The second topic was dissociation of components in a more modular architecture, for example relying on DB and MQ. This aspect is, IMHO, still on topic. I am no expert in ManageIQ current architecture/code but I would see it using a factory pattern, each providers implementing it. And my point is that it could be the same for other objects in the application, making it easier to extend and maintain.


#6

Definitely

Are you suggesting we have a more consistent way to access the ManageIQ database? If so, that’s an interesting idea, but one I can’t think of any use cases for.

Generally, provider interactions shouldn’t need the ManageIQ database and could probably live self contained. Not that there aren’t specific models that need to be created for a new provider, but that model should probably delegate to code that doesn’t care about the database. This way the model is a thin layer of glue between ManageIQ the Rails app and ManageIQ’s provider layer.

If the code does need to access the database, then we have the REST API, which is a pretty consistent layer.


#7

I only have pretty old notions in development, so I may say something stupid in the following lines. Please be kind :blush:

I agree. The provider should not care about the database and simply provide a consistent request/replies model : an API. This is why I talked about a factory pattern : the provider abstract class (ExtManagementSystem) would define the API and the subsequent implementations would do the real work, returning status/data in a consistent format. The translation of the native format to the ManageIQ standardized format should be left to each provider implementation.

To me the REST API is just a way of exposing the features of the application to other applications, while the WebUI exposes them to human beings. So, in a MVC approach, it would still be a view. And the API : ManageIQ Features & Class Mapping would confort this idea.


#8

As @Fryguy pointed out, I really do want to write up some ideas about this.

Here’s a super high level view of the ideas that I’ve been kicking around for this. First, we break down the provider interactions into three different areas:

  1. Inventory Collection (a.k.a. “ems refresh”)
  2. Event catching
  3. Metrics collection (a.k.a. “cap and u”)

Inventory Collection

Currently, this process is nicely separated into two separate steps:

a. Collect inventory from the provider (build a huge hash of the entire inventory)
b. Save the inventory hash into the ManageIQ database

What’s nice about this current process is that the process of gathering data is separated from the deep knowledge of the ManageIQ database fields.

However, there’s still the problem of knowing exactly how to construct the massive inventory hash in step (a).

The suggestion to make this even easier to consume from the outside is to introduce the idea of an inventory event handler.

This way, someone wishing to write the inventory collector for a brand new provider would need to know how to collect the inventory from the provider, and how to emit events that our inventory event handler cares about. Then, under the covers, our inventory event handler would be able to build the appropriate inventory hash, and update the database accordingly. In fact, in the future, this would give us much more flexibility to be able to update the “save inventory” process how ever we want without ever impacting the inventory collection process. Today, it’s absolutely necessary to collect the entire inventory before saving it to the DB. Tomorrow, maybe we could save it in stages, or who knows…

Inventory Collection Sticky Points

The following points make this a little more difficult than “a simple matter of typing on a keyboard”:

  • create the inventory event handler
  • providing a way to link different inventory items together
  • i.e., hosts need to own VMs … how would one emit a “new host” event, then emit a “new VM” event, and still somehow indicate that the VM and the host are related?

Event Collection

This is another area where we’re almost there.



Fig 1: Openstack Event Model

The event model (Fig 1) shows the various parts involved in collecting and processing events. This particular illustration uses Openstack as an example.

This is broken down into a few different areas that we really care about here:

  1. event monitor
  • see OpenstackEventMonitor
  • responsible for listening for events from openstack
  • hands events to the event catcher
  1. event catcher
  • see EventCatcherOpenstack
  • responsible for catching events from openstack
  • registers the events with ManageIQ database
  1. event handler
  • see EventHandler
  • responsible for retrieving events from ManageIQ database and deciding what to do with them
  • reads the event_handler configuration file and performs configured action based on event contents

To make event collection provider-driver friendly, I honestly believe that it’s a matter of drawing a line in the sand in the Event Monitor -> Event Catcher -> Event Handler chain and deciding that one part lives on the Provider Driver side, and the other part lives in the core ManageIQ side.

My first hunch here is that the Event Monitor and possibly some of the Event Catcher lives in the Provider Driver side. That leaves the majority of the Event Catcher and likely the entire Event Handler in the core ManageIQ side.

Event Collection Sticky Points

None?

Metrics Collection

Somewhat similar to Inventory Collection, Metrics Collection has a very particular expectation for the format of the data before it can actually process and save it to the ManageIQ database.

I actually have not looked at Metrics Collection in quite some time. It would take a decent amount of time to come back up to speed on how to even start breaking this up.

However, one quick idea is to look at simply how to separate out the actual collection of data from the formatting and processing of the data. This remains, by far, the trickiest place to tease out a “provider-driver” friendly approach. The metrics collection logic is hugely ingrained in two different areas:

  1. Metrics and metric collection logic that are very specific to vCenter metrics.
  2. ManageIQ reports that depend on specific

Metrics Collection Sticky Points

All of it?

On of the main problems here is that the data is expected to look and feel very much like vCenter metrics data. This includes the time interval for the metrics. For instance, vCenter produces metrics data every 20 seconds. However, Rhevm produces metrics data every minute. This means that for every single data point from Rhevm, we have to manufacture two additional “fake” data points in order to match the 20-second cadence set by vCenter. Openstack allows the Openstack administrator to define a completely customized metric collection period.

It’s possible that some of this legacy constraint could be kept in tact and well hidden from the provider driver concept. However, we still have to figure out how to draw a line in the sand between provider driver and core ManageIQ.

Summary

These are just some ideas that have been floating around my head. Please add your comments. I’d love to hear other ideas. Even if it’s just to poke holes.

Regardless of what paths we take, clearly we have a decent amount of research ahead to really develop a workable plan.

Also, I do plan on expanding on this quite a bit. So, let’s get different ideas out in the open so we can evolve these ideas in the right direction.

Thanks!


#9

The general idea is to remove as much as possible that is provider specific out of the Rails part of the application (vmdb directory), and into the standalone part of the application (lib directory). Ultimately, I could even see those provider specific things becoming entire gems of their own allowing for true pluggability.

Some more stuff I remembered…

Provider Actions & Validations

  • Actions (e.g. start vm, stop vm, clone vm, etc.) are in the models for some providers, such as VMWare, and should be separated out.
  • Validations (e.g. can’t call start on a Vm already running) are buried in the models, and are really tailored for how the UI needs to see it. I think this should all come out of the models and be provided at the pluggable layer. For example, I could see the provider layer providing a method can_start_vm?(current_state), and the model delegating to that method handing off the current state it knows about from the database.

Can’t be separated out

  • On the Rails side of the application we rely somewhat heavily on STI. This requires that a model for that subclass get created in the models directory. I guess we could create the model in the lib directory and then have some registration mechanism where it gets copied in, but that seems really janky to me.
  • On the worker side of the application, we use separate workers for things like event catching. Much like the database model, we would still have to have new subclasses, or perhaps a registration concept.
    • A subpoint of this is the worker configuration, which is currently stored in vmdb.tmpl.yml.

I’m sure I’ll have more, but this is off the top of my head after reading your post, @blomquisg. Can you create a GitHub issue with the proposal in Markdown, and keep it updated as more thoughts float around in this thread?


#10

Opened issue: https://github.com/ManageIQ/manageiq/issues/519

Let’s move any discussion there. I’ve referenced this talk article in the issue body for posterity.


#11

I realize that this topic has not been addressed here for quite some time…and I have followed the issues/discussions that have been move to github, eventually following to https://github.com/ManageIQ/manageiq/issues/1272 As a side note, for future questions where the topic has been moved over to github, should I have made this inquiry under the topic on github instead?

I was wondering if anyone could provide an update on the status of this topic? Has any progress been made on moving to a standardized, formalized way of implementing new providers? And if so, has any documentation out of that effort been put together? My interest stems from a desire to be able to add custom providers for cloud offerings where OpenStack is utilized as the underlying infrastructure.

Thank you in advance for any replies.