Rethinking Providers and Managers


#1

The topic of Providers and Managers in the ManageIQ codebase comes up from time to time, but since we’re in the middle of looking at re-architecture strategies for core ManageIQ, this seems like a fitting time to actually tackle this problem.

Who should stop reading now?

If you’ve never looked at the ManageIQ data model, or if you’ve never cared about the relationships between provider-specific model objects. If you haven’t had coffee yet. If really really like spaghetti code and want it to stay that way. If you have never been struck with omphaloskepsis.

Problem

Some provider integrations use the Provider model, and others don’t. Some providers don’t seem to need it, while others that might benefit from it don’t use it. It boils down to: consistency.

But, like anything, once I started thinking about this, I started peeling back the layers of the onion and finding underlying problems that either could be fixed at the same time, or need to be addressed to move this topic forward.

So, I wanna lay out what I’m seeing, and I want others to chime in with what they’ve come across and any ideas they have for making this a better model.

Background (in semi-nonfictional prose)

Originally, there was ext_management_system, and it was good. It served the purpose of modeling virtual infrastructure management systems wherein each installation (or, endpoint) was a singular self-contained thing, and it was not overly complicated.

Enter OpenStack, stage left. This guy strolled into town and upended many classical ideas. Namely, that it’s possible for a single connection definition to carry the burden of an entire management system. So began hard coding, like what endpoint to use when fetching events from OpenStack.

But, OpenStack and Amazon teamed up to say, “What if you only want certain pieces of AWS or OpenStack (like Storage or Networking), and not the whole thing?” And, you better believe that the PMs were excited over this!

Engineering began wrestling code that enabled the idea of splitting out parts of providers into separate managers. And, from the rooftops, they shouted: "Ext_management_system is dead! Long live ext_management_system." Because the Manager was born, and it was ext_management_system that would maintain control over the various type of managers in a provider.

But, it wasn’t the end of the journey, because in a smoke-filled room sat @kbrock and @bdunne conspiring to make solid the amorphous nature of this new “provider” concept. Through whatever incantations were used that day, the Provider model was born. But, only the Foreman provider knew anything about the new model. And, it would remain so for countless days.

The great provider namespacing was well underway by this time, which would be lauded by some and berated by others. But, in the end, all provider implementations were relegated to their namespaces, to live out their days in isolation. Little did they know that this was only the beginning, and while namespacing separated brother from brother, the separation of repositories that followed would literally tear code families apart.

There lived the code, separated. And, never knowing that only some of their distant relative provider integrations were using the Provider model, while others were not.

Seriously though … why are you still reading?

Because, this is a real problem. If for nothing else other than consistency, we should figure out the right way to model Providers. But, there are lots of concepts to consider.

Rules for the game

Before diving in, let’s set some ground rules:

  1. Forget what you already know about the Provider and Manager models. Let’s assume we can start from scratch. It’s possible that later we’ll have to guide this conversation back to “How does this fit with the current model?” But, let’s cross that bridge later.
  2. Ignore ManageIQ zones and regions…for now. I don’t know what’s happening with those after the re-architecture (maybe nothing), but I don’t want to get caught up in those details right now.
  3. Let’s not go too far down the rabbit hole. I don’t want to talk about the details of each provider’s idiosyncratic model objects. It’s less important for this discussion to know that Container Management has pods or that Physical Infrastructure has firmware. It’s more important to know that some cloud managers use a “master account” while others use “subscriptions”.

Where do we start?

I don’t know. But, I’ll take a stab.

Provider accounts and tenants

One of the things that I think is going to drive this discussion is how providers create boundaries around things ManageIQ might consider “accounts”. Today, a provider in ManageIQ can have only one account.

For AWS this means that you have to decide whether you’re going to select a Master Account or a subsidiary account when setting up the provider.

Azure, on the other hand, has a concept of Subscriptions that are being used to segregate billing. Currently ManageIQ treats each subscription as a unique provider instance.

For OpenStack, there’s really only one account, but OpenStack has tenants that create barriers between objects inside the provider.

And, vSphere has no real concept of tenants or any way to reliably separate objects into useful groups.

It’s important to understand all of the scenarios here and ensure that we’re capturing the concepts in a way that doesn’t immediately alienate other providers.

I think if @bascar, @jonnyfiveiq, and @loicavenel can collectively brainstorm here a bit on what we want the different Provider Accounts to look like here from ManageIQ’s perspective, we can have a good discussion about how that model shapes up.

Provider services

ManageIQ eventually dealt with the problem of connecting to multiple services by introducing endpoints. But, this still heavily relies on the legacy authentications model that serves two very different masters: Provider connection credentials and provider object connection credentials.

Wait, what?!

The authentications table holds credentials for connecting to provider accounts (i.e., the provider API connection) and the same table holds credentials for connecting to hosts and VMs inside the provider. In general, this seems like a fine idea. It’s all just a bunch of credentials, but it’s the relationships and assumptions made about those relationships in the AuthenticationMixin that make it hard to tease apart these two concepts.

Nonetheless, I believe that we have to deal with the notion that Providers have several services they offer. Those services can usually be categorized as one of:

  • Inventory API
  • Disk API (SmartState)
  • Event stream
  • Metrics data

Maybe it’s time to actually model this concept. Maybe that fits into the endpoints model. Not sure…

Providers and Managers

Some of the initial business driver from separating out the Managers from Providers was to create an isolation context such that someone might someday be able to use, say, AWS S3 without having to setup or care about an entire AWS provider (e.g., I don’t care about EC2 and instances, I only want S3 and buckets).

But, I don’t know that this is actually happening in the field. Or, maybe we didn’t do a good enough job separating it to be able to leverage it this way. It’s completely unclear to me. What I do know is that we ended up implementing something that doesn’t seem to be used the way we thought it would.

So, do we even need to expose this notion of separate managers at all? I mean, internally, it might make sense to segregate certain functionality into different scopes. But, what’s the point of exposing that concept to the end user, unless it’s going to be used in some interesting way?

User Experience

We have a lot of learning to do here, imo. We have separated out Networking, Storage, and Compute. But, while we support Virt Infra, Cloud, Containers, and Physical Infra for Compute, we only really support Cloud for Networking and Storage. Meaning that while we have top level navigation that shows Networking, you’re only going to find cloudy things there. But, if you look in Compute, you’re going to find all kinds of objects intermingling.

If I setup an AWS provider today, I automagically get a Networking and Storage manager. But, I didn’t know those things were created. If I go look at Networking and Storage later, I may be surprised to see things there relating back to AWS. And, I can’t create a Networking or Storage manager. I can only see them once their created after adding a Cloud Compute manager.

I don’t know how to clearly define what the user experience should be. I can just say that even when I limit the scope to just Providers and Managers, I’m overwhelmed and confused as a user. And, I want to make absolutely clear here that I’m not laying blame on the UI team for this. The notions behind Providers and Managers was a massive cross-team effort, from PM, Backend Eng., and UI Eng.

Others?

Probably … I got tired typing though. I’ll try to revisit this. But, I wanted to get this out in the open and start the discussion.

If you have feedback, questions, comments, disagreements, tomatoes, or accolades, please don’t hesitate to add comments.


#2

I can only mention 10 people in a topic, apparently. So, I’m replying to my own post to mention more. And because I’m lazy and don’t want to “Invite” several people individually.

/cc @agrare, @bronaghs, @djberg96, @jameswnl, @juliancheal, @Ladas, @durandom


#3

/cc @bascar, @jonnyfiveiq, @loicavenel
/cc @simon3z, @abonas, @ovedo, @tzumainn


#4

/cc @Fryguy, @kbrock, @bdunne, @dclarizio, @chriskacerguis, @sdoyle


#5

First of all, enjoyable writing especially the part

I think @blomquisg is in the wrong (or less glorious) career path :slight_smile:


#6

HI @blomquisg
From the customer side I want to pay your attention about distributed nature of cloud providers and managers if now there is a great opportunity because modern cloud management tools often takes DevStack/PackStack syndrome and ManageIQ including.

For OpenStack first of all there are regions which have each their own entry point. And after regions and within regions we have managers that manage computing, network and storage.

For example currently there is confusion in ManageIQ to use cloud tenants. For one Openstack installation I should to configure multiple providers for each openstack region, but cloud tenants are distributed across regions. Though one tenant is one and the same entity in different regions.
This is just one of the problems that happened for my distributed environment. So were the nuances of Cinder Manager.

For user there should be possibility to select regions for cloud instances/volumes/networks and after than cloud managers related to selected region for this tasks.


#7

/cc @aufi


#8

What did they put into your breakfast cereals? :slight_smile:

I think one other huge issue that we are seeing today is that originally all in ManageIQ was a (virtual) Host. Plus a bit of supporting stuff. Does not matte if that host is bare metal, a VM or a container (as mini-VM).
Model items like Middleware servers or complex software defined networks don’t fit that model anymore. Of course a Java Application server is a JavaVM with stuff on top of it, so the VM notion can be abused to model this, but it would be wrong.
What I want to say here, is that a lot of the base entities would need to be changed too to better accustom for those new types of managed things.

Another one that you indirectly mention is better support for linking of things. One wants to be able to “pull” on the slow web app string and find out that it is slow because in some completely different (but linked) part of the infrastructure a backup just started which is taking the resources that would be needed to fulfil my business processes.


#9

Let’s don’t forget that we will have models that overlay others:

  • You can create an overlay network model (we have some code already doing that with Nuage), and then the overlay and the underlay will be related (one of top of other) but will not match one to one. One machine in a virtual segment can have two IP addresses in two different underlying networks.

  • Some objects need to be once in the system, even if you can see them in many places. Again, S3 storage will be one. How that relates to affinities? What would be the best worker to act upon an object that can be accessed in different regions? Should a user that sees an object in one region see the same object in every region as he can change it trhough others?


#10

First big :thumbsup: for tackling this, the multiple managers thing makes sense in theory but I’ve never seen the benefits in reality coupled with “duplicate” refresh/event workers it’s always seemed to be more trouble than it’s worth. Maybe splitting the managers at the ext_management_system level was too far up the stack?

And, vSphere has no real concept of tenants or any way to reliably separate objects into useful groups.

vSphere actually does have a way to create different users and groups and allow you to scope inventory visibility differently for those different users/groups…we just don’t deal with that at all currently. I don’t think many people use the vSphere client as a service front end for users but this is something we could investigate.


#11

  Providers and Managers

Some of the initial business driver from separating out the Managers
from Providers was to create an isolation context such that someone
might someday be able to use, say, AWS S3 without having to setup or
care about an entire AWS provider (e.g., I don’t care about EC2 and
instances, I only want S3 and buckets).

But, I don’t know that this is actually happening in the field. Or,
maybe we didn’t do a good enough job separating it to be able to
leverage it this way. It’s completely unclear to me. What I do know is
that we ended up implementing something that doesn’t seem to be used the
way we thought it would.

So, do we even need to expose this notion of separate managers at all? I
mean, internally, it might make sense to segregate certain functionality
into different scopes. But, what’s the point of exposing that concept to
the end user, unless it’s going to be used in some interesting way?

I believe we had Hawkular used from both Middleware and OpenShift, and
looking at neutron used from both OSP and RHV. I’m sure we’ll see the
same with metric endpoints (like prometheus) being relevant to multiple
providers.

  User Experience

We have a lot of learning to do here, imo. We have separated out
Networking, Storage, and Compute. But, while we support Virt Infra,
Cloud, Containers, and Physical Infra for Compute, we only really
support Cloud for Networking and Storage. Meaning that while we have top
level navigation that shows Networking, you’re only going to find cloudy
things there. But, if you look in Compute, you’re going to find all
kinds of objects intermingling.

If I setup an AWS provider today, I automagically get a Networking and
Storage manager. But, I didn’t know those things were created. If I go
look at Networking and Storage later, I may be surprised to see things
there relating back to AWS. And, I can’t create a Networking or Storage
manager. I can only see them once their created after adding a Cloud
Compute manager.

I don’t know how to clearly define what the user experience should be. I
can just say that even when I limit the scope to just Providers and
Managers, I’m overwhelmed and confused as a user. And, I want to make
absolutely clear here that I’m not laying blame on the UI team for this.
The notions behind Providers and Managers was a massive cross-team
effort, from PM, Backend Eng., and UI Eng.

+1 to sorting the UX out.

  Others?

Probably … I got tired typing though. I’ll try to revisit this. But, I
wanted to get this out in the open and start the discussion.

If you have feedback, questions, comments, disagreements, tomatoes, or
accolades, please don’t hesitate to add comments.

I know this is a bit beyond what you were focusing on, but while we are
revisiting the modeling, would be interesting to see how can we make
providers “lighter”. i.e., support dynamic inventory and allow to
construct ad-hoc providers (i.e., “without code”, or at least
pre-defined schema) so someone can push the relevant
inventory/events/metrics to a dynamic inventory, elastic, prometheus,
etc., then model the entities (hence ui) they want in CFME.

Thanks,
Itamar


#12

Great you bring this up.
Here are some links for the archeologists:

I totally agree with @agrare that adding managers and providers to the mix of existing ext_management_system add more confusion than it actually helped. Unfortunately the most helpful thing with providers and managers, namely linking all managers to a single provider object, was rarely used and instead we use parent_managers to link e.g. a network manager to a cloud manager. Which makes the thing even more confusing.

We have two competing challenges at work here.

  1. A user that expects all (tenant, scoping) concepts of his provider to be present
  2. A single pain, ehm, pane of glass (UX)
  3. Support 1 & 2 via a flexible code domain modeling

In order to make 2. work we need to fully understand 1. And I guess what we did is: based on understanding one provider we came up with a domain model (3) to support it. But without 2. in place for all providers.

Maybe we can research 1. and look at all commonalities of the various providers and then get together with a UX team how the commonalities can be used in the UI. Once that is done, we can re-visit our provider - managers concept and see where it needs tweaking or re-arch.

Actually I would refrain from making the code more consistent before doing this, as we dont know what model we are shooting for.


#13

I really like the idea of a ‘lighter’ provider, as this would remove a lot of boiler plate code. But it would need more thorough re-arch and re-think how all those operations are executed. E.g. adding a vm as inventory should be straight forward without any manageiq code (just send inventory to the inventory service), but if we want to power down that vm, we currently need a piece of code that runs inside a core worker.

Maybe we re-arch that to a service model, then APIs become the contract and you can create a provider at runtime and point it to your own service. I guess then we have a real platform :slight_smile:


#14

@iheim there is actually some work done POCing dynamic inventory for middleware by Caina, please see here:
http://lists.jboss.org/pipermail/hawkular-dev/2017-July/003938.html