Allow versioning on AE Datastore custom domain


#1

What about having an external filesystem tree for the Automation Engine Datastore, because the integrated editor is not really convenient to develop code. I end up with a directory on my laptop that mimics the Domain tree and that is versioned using Git. Unfortunately, I still have to copy/paste my methods in the editor and to create my classes by hand.

I understand that storing the AE Datastore in the database provides consistency between the nodes. We could have a Git repository in the appliance (a sub-tree of /var/www/miq/vmdb/product/automate ?) that triggers sync with hooks. We could also develop (or have contributions for) tools such as a specific Eclipse plugin, a workflow designer, etc…

Also, for a better use of simple text editors, the data structure for classes could be serialized in JSON/YAML rather than XML.

Comments welcome !


#2

It’s like you are reading our minds :smiley:

A few of us have thought A LOT about using Git as a storage medium for the automation tree, instead of the database, and the possibilities it opens up are pretty awesome. My personal favorite that we’ve thought of is that we basically get version control in automate, and the ability to potentially “undo” something that was committed in an automate script.

@mkanoor has been working on this for a while, and he can speak to it a lot more. @mkanoor, do you think you could put together your thoughts?


#3

This has already been done. See https://github.com/ManageIQ/manageiq/tree/master/vmdb/db/fixtures/ae_datastore/ManageIQ for the current non-XML form of the tree that is being used. Note that Ruby files are just regular .rb files as well. Getting into this format was actually Phase 1 for getting to using Git as a storage medium.


#4

@Fryguy : Nice, exactly what I was thinking of. So, if I build the application and modify this tree, will it be applied automatically ? Or is it just some mockup/clone ?


#5

Technically, yes, but that would go against the spirit of the Automate structure. The ManageIQ directory there is meant to be the read-only domain that is shipped with the appliance. We now have the concept of domains in automate, where if you want to override a particular method or add your own classes, you create a new domain that represents your changes on top of it. Domains are searched in order, so your override domain is looked at first, and then if not found, we search the parent domain. This allows us to do things like upgrade the ManageIQ domain without there being a conflict with changes made by a customer.

So, to do what you need to do, you would create a sibling directory next to ManageIQ called Customer, or whatever you like, build it up with the changes, and then import that into the system. I’m not 100% sure if the import functionality has made its way into the UI but there are backend cli tasks that can do it. @mkanoor or @tinaafitz can give you more detail.


#6

That is what I was thinking of. I also thought that if it was a Git managed directory a hook would update automaticaly the system with the changes. Then, you can stage your development : you create your code, push it into a dev instance, then if it is ok, you push it to the production instance. It would be more DevOps-ish :wink:


#7

Under the db/fixtures/ae_datastore directory each sub directory is a domain. There is a structure for the domain folders.

Domain Rules

  1. A domain directory has a file called __domain__.yaml which stores
    the domain properties. Any directory that doesn’t contain the
    __domain__.yaml is skipped.
  2. A domain can contain either namespaces or classes, if any of the sub directories don’t follow the namespace or class rules they are skipped.

Namespace Rules

  1. A namespace directory has a file called __namespace__.yaml which contains the namespace properties
  2. A namespace can contain other namespaces or classes. If any of the sub directories don’t follow the namespace or class rules they will be skipped.

Class Rules

  1. Each class directory name has the class name followed by a .class suffix. e.g. Lifecycle.class
  2. Each class directory has a __class__.yaml which stores the class properties, schema values
  3. A class directory also contains the instance yaml files stored individually. The instance file only needs to contain those fields from the schema which have a value. Others it can inherit from the schema.
  4. A class directory optionally contains the methods folder

Instance Rules

  1. Each instance is stored individually in the class directory as a yaml file
  2. The instance yaml file is sparsely populated with the fields from the schema that have values. Any fields missing from the instance yaml file are inherited from the class schema

Method Rules

  1. A methods directory exists under the class directory and is called __methods__
    2.Each method directory has yaml files to store the method properties and input parameters which are akin to the class schema. These are called method instance files.
  2. The method directory can optionally contain the ruby scripts for the inline method types.

During an reset (bin/rake evm:automate:reset) or seeding (at startup) the files from these directory trees are processed and imported into the Postgres Database. The CFME system currently works out of the Automate model in the Postgres Database.

So if a user wants to put a particular domain under version control they could temporarily put that directory under GIT. We are planning on adding direct GIT support for the Automate database in a future release.


#8

Some of the ideas we have regarding automate database under version control are

Automate Versioning

Currently the AUTOMATE models are stored in Postgres in 7 different ActiveRecord classes
MiqAeDomain (Is the highest level namespace stored as MiqAeNamespace)
MiqAeNamespace (Folders to separate out the classes)
MiqAeClass (stores the schema which is a collection of MiqAeFields)
MiqAeInstance (stores the instance properties and values are stored in MiqAeValue)
MiqAeMethod (stores the method properties, script, input parameters (collection via MiqAeField))
MiqAeField (stores the class schema and method input parameters)
MiqAeValue (stores the instance values per instance/method based on the MiqAeField)

Whenever changes are made to the Automate Model the changes are committed into the DB and its not easy to do version control on the Automate Model.

In Anand release we have split up the Automate Model into Domains so that each domain can be owned by different groups (Community,Vendor,Customer,Site). Each domain has a priority when the Automate Engine executes a model it will search for instances/methods in the similar location (namespace+class) in other domains and pick up the one with the highest priority.

Another change that was implemented in Anand was the splitting of the automation_base.xml into separate single YAML files for each of the domain,namespace,class,instance and method. The field and value are directly stored in the YAML and don’t have a representation on the disk.

The Automate model files are currently stored in the filesystem but are imported at startup into the Postgres DB or via an explicit Export feature under Automate Explorer.

We are planning on using version control since the model files are now on the filesystem split up as individual YAML or .rb files. We can make each domain to be part of a version control system like GIT.

Here are some of the points to ponder as we make this transition

Advantages of Storing Automate in DB

  • Active Record search/add/update/delete features
  • Multi Zone Synchronization
  • Column searches of data is easier

Advantages of GIT for AUTOMATE
Version Control

Challenges of implementing GIT especially in the AUTOMATE use case
Files would have to be read(cached) when doing attribute specific searches
Need a GIT client in each of the zones to synchronize the changes (git clone)
Support Active Record style functions
Remove all the ID specific functions
Since the YAML file has the field and the values inside of it the MiqAeValue and MiqAeField classes would become irrelevant, so any methods calling into these classes would have to be updated

At a high level the goal is to do parallel development till we have all the functionality in the GIT Automate classes and then swap out the

MiqAeDomain,MiqAeNamespace,MiqAeClass,MiqAeInstance,MiqAeMethod,MiqAeField,MiqAeValue

with the YAML equivalents

MiqAeDomainYAML,MiqAeNamespaceYAML,MiqAeClassYAML,MiqAeInstanceYAML,MiqAeMethodYAML

The YAML classes would inherit from the MiqAeGit class which will provide all the ActiveRecord functions for Automate that are in use throughout the product in many different places.

There would be some modeling needed to store the GIT REPO information

Each domain would be stored in a separate repository.
The domain names and the GIT URL/credentials will be stored in a Relational Database (Postgres)
For the editable domains we would keep a history of all the commits

The Administrator would have to arbitrate between the list of available commits from different automate editors and which ones to be used by different automation request based on some criteria (production/QA).

The commit to use per domain would be stored in the $evm.root object so that the Automate engine can use the right commit to process a request and can support multiple commits.

Automate stores relationships which are a kin to the path in a filesystem. These are edited by the user so it is possible that these could be in mixed case. Automate internally ignores the case of the domain, namespace,class,instances, methods and fields in schemas and methods. Since the filesystem can be case insensitive or case sensitive, git would have to be configured to be case insensitive.
.git/config
[core]
ignorecase = true

This should be done whenever we create a new domain. to keep this setting local to the automate domain and not for every git repo on the system.

Will the Automate Explorer be able to show the different commits and allow users to make changes to previous commits (amend commits inside of Automate Explorer)

There might be other things I might have missed, this is just a collection of thoughts on this topic.


#9

Some more points which were partially mentioned here, that git will buy us, are:

Fixing an issue where automation state machines can get into a bad place if the model is changed in the middle of the run.

State machines can pause themselves while running, and when they restart, the full workspace is reinstantiated. However, presently, it uses the latest code. If someone modifies the state machine itself, then ones currently running can get into a really weird state. This can all be fixed with @mkanoor’s suggestion of keeping the git SHA with the execution, so that when we reinstate the workspace we do so at that specific git SHA, where nothing has changed.

Rollbacks and audit log tracking can be more robust.

Basically, the git history only ever moves forward, we never roll it backwards. This is the proper way to use git, and in doing so we get some interesting auditing. For example, John modifies an automate URI reference to point to something different, saves his change, and we make a commit with his information. Mike realizes this breaks a lot of stuff and wants to “undo” John’s change. We DON’T rollback to the previous commit; instead we leverage git’s revert capability to “undo” it as a new commit. This gives us a really nice audit log.

2394abdf (1 day ago) Initial commit
3912bdfc (5 hours ago) Changed URI - Saved by John
6429abdf (1 hour ago) Revert commit 3912bdfc "Changed URI - Saved by John" - Saved by Mike

Using history to manage dev/test/prod workflow.

The system could aid in managing this workflow by knowing which git SHAs currently represent dev/test/prod. When the system uses automate it would go against the current prod SHA, and thus automate developers would feel safer in making changes knowing they won’t kill production. Moving test into production is a matter of updating the git SHA for prod.

Note this could also be done with domains, where a test domain is built on top of the latest production domain, but marked as “not executable” or something. Then when the domain is ready it can either be merged into the production domain or just have the “not executable” flag turned off.