OpenStack Support


#1

All,

Thank you very much for your hard work!! AFAIK, this is the first Open Source CMP! I would love to contribute code to assist with data collection against OpenStack environments (distro, architecture, installed packages, etc.) This would really help with OpenStack adoption in the enterprise space, where IT governance can be strict. However, I only see three possible ways to get the same level of data you are retrieving from other environments:

  1. Utilize an application on the hypervisors which periodically executes guestfish/libguestfs queries against each /var/lib/nova/{instace-id}/disk, and alerts ManageIQ over AMQP (and/or has a REST API for gathering data on demand.)

  2. Install an application on each instance (opening the door to collection of data from OpenStack public cloud environments), and have it report back to ManageIQ.

  3. Try to submit a new upstream OpenStack project just for gathering this info.

As this represents a serious architectural decision, I would like to know from the core developers which path (or none of them!) is best to pursue for adding good reporting capability to ManageIQ on OpenStack.

Greg


#2

Hey there,

First off, thanks for your interest! We’re really excited to get this out in the open.

Let me lay out the ways in which we gather information from providers, and what we’re currently doing with OpenStack. Then, we can talk more about what areas we’re lacking or what needs more help.

A ManageIQ admin user creates a Cloud Provider record for an OpenStack instance. We currently consider a single Keystone endpoint as an OpenStack instance. This Cloud Provider record includes the administrator’s credentials. This allows ManageIQ to connect to an OpenStack instance as the Admin user (of the Admin tenant) and ask for data.

We use the Fog library for connecting to OpenStack, once the Cloud Provider connection is created.

There are a few different ways we collect data from OpenStack:

  1. Images: ManageIQ collects image disk information for both running VMs and dormant image disks for all providers. For OpenStack, because there is not current way to stream image bytes, this requires reading the entire image disk locally and reading the blocks we’re interested in. We endearing term this the “fleecing” process.

  2. Inventory: ManageIQ also collects general inventory about most everything we can get our hands on for various providers. For OpenStack, this means using the Fog library to request as much as we can about vms, networking, images, etc… You can get a glimpse into this here. As you can probably tell by the code path there, this is called the “EMS Refresh” process; where “EMS” means “External Management System”.

  3. Events: ManageIQ listens for events on various providers, and each provider likes to handle events differently. For OpenStack, this means tapping into the AMQP bus and listening for events. We currently handle listening on Rabbit and QPid buses. Events are extremely useful in the ManageIQ world, because it lets the system know when something under the covers has changed…e.g., when a user creates a new VM by going directly through Horizon.

  4. Metrics: Finally, ManageIQ likes to present usage statistics back to users across all of the supported providers. For OpenStack, this means using Fog to connect to the Ceilometer service and getting as many metrics as we can understand.

We have most of this code written today. However, it’s important to point out that we’re always looking for ways to improve upon it. There’s a lot of code to get through. However, if you’re interested in any particular part, we can always take some time and guide you through some specific areas. Then, it’s just a matter of getting your feet wet. :slight_smile:

I hope this helps. Please let us know if and how you wanna help and we’ll be all ears!

Thanks again!


gdb


#3

gdb,

Thanks for the reply and encouragement! I was completely unaware #1 was happening. How are you downloading each image to the ManageIQ server and then “fleecing” them? What do I need to do to enable this? Also, where in the tree is the OpenStack fleecing process happening?

Greg


#4

I believe that we use Fog to read the image. And, I may have looked into my crystal ball a bit here. I think that fleecing of OpenStack images is still under development (which means it’s a great time to get involved :smile:)

The best person to talk about the fleecing code is Rich Oliveri. I’ll ping him as soon as I see him online.

Also, join the #manageiq IRC channel on freenode if you’re not already there. I’m blomquisg in that channel and I believe that Rich’s nick is “roliveri”.

Thanks!


Greg Blomquist (gdb)


#5

Hi Greg,

Welcome to the community!

As gdb mentioned, “smart state analysis” (fleecing) of OpenStack images has already been implemented.
To obtain the actual disk contents of the image, we use the Ruby Fog gem to issue a “get image” request
to Glance. Once we have the bits, fleecing is performed by code that is common to most, if not all, providers, including OpenStack.

You can think of fleecing as an automated forensic analysis of the target (virtual machine, template, image, instance, etc). Once we have the contents of the virtual disk, we interpret various layers of metadata:

disk container -> partition information -> logical volume manager -> filesystem -> application

Giving us a full view of the guest environment of the target. Once we have that view, we’re able to extract the information we’re interested in: users, groups, installed software, services, etc.

Most of the fleecing code is under the manageiq/lib directory. Under lib, there are directories for the various metadata layers that need to be interpreted: disk, fs, VolumeManager, etc.

Code that’s specific to OpenStack fleecing is in lib/OpenStackExtract/MiqOpenStackImage.rb, but that will probably change as the fleecing code gets refactored.

The biggest issue we currently have with OpenStack fleecing is the need to download the entire image to the ManageIQ appliance (server). This is because Glance doesn’t support partial reads of the image. Ideally, we’d like to enhance Glance to support reading image data based on offset and length. Given this enhancement, we could then implement a disk module that would read the image data directly through the API, eliminating the need to download the entire image.

Thanks for your interest in contributing to this feature! Please let me know if/when you have any additional questions.

rpo


#6

All,

Thanks for the support! It’s inspiring to see how much you want to help get the community involved! I’ve read through your responses and there were a few points that I felt are important to re-examine. Getting the images from Glance will not provide you with the “running” image in nova. These images can be thought of as base images, used to either manually construct a server (or to automatically build an instance with Heat, Cloudify, puppet/chef, etc.) So examining these for governance purposes can be very misleading. For instance, a user could convert their CentOS image to RHEL, install a bunch of weird packages, disable SELinux, etc. Also, pulling each image down will tax the OpenStack management network (images are often cached on the KVM nodes resulting in faster spin-up and less traffic.) I completely understand the desire to have this project remain agentless tho. Therefore, I thought through ways to accomplish this without putting something on the guest OS or even on the hypervisor. I wrote some code to illustrate :smile:

Using ruby and the guestfs gem, I have a small app which does the following:

  • Grab the hypervisor list from nova or the CLI
  • iterate through each hypervisor, doing the following:
    • Mount the hypervisor’s /var/lib/nova directory to the local host with sshfs
    • Locate the instances and their “disk” images on the mounted directory
    • Parse each instance with guestfs (grabbing product, version, etc.)
    • Place the data into an object
    • Unmount the remote directory

The output currently looks like:

# ./openstack-inspect-guests.rb caladan.cloud-ninja.org
Mounting hypervisor: caladan.cloud-ninja.org
root@caladan.cloud-ninja.org's password: 
Processing: b63f1d2e-db20-4a0b-b568-750735dd3170 ... [7 seconds]
Processing: 3314b4da-95e0-4c61-a9f5-7ce6149244cb ... [20 seconds]
Instance b63f1d2e-db20-4a0b-b568-750735dd3170
  Product name: Windows 7 Ultimate
  Type:         windows
  Version:      6.1
  Distro:       windows
  Arch:         x86_64
  Hostname:     win7
  Drives:       {"C"=>"/dev/sda2"}
Instance 3314b4da-95e0-4c61-a9f5-7ce6149244cb
  Product name: CentOS release 6.5 (Final)
  Type:         linux
  Version:      6.5
  Distro:       centos
  Arch:         x86_64
  Hostname:     localhost.localdomain
  Drives:       {"/"=>"/dev/mapper/vg_system-lv_os", "/boot"=>"/dev/sda1", "/opt/rh/postgresql92/root/var/lib/pgsql/data"=>"/dev/mapper/vg_data-lv_pg", "/var/www/miq_tmp"=>"/dev/sda3"}

There is a lot more you can grab with guestfs (like contents of files, list of installed apps, etc.) I’d love you feedback on getting the info from OpenStack using this method. I believe it would be much faster than manually transferring and “fleecing” each one.

Please see my code here:

Greg

(ghayes@redhat.com)


#7

Hi Greg,

I’m sorry, I didn’t provide enough detail in my last reply.

To fleece running OpenStack instances, we simply snapshot the instance and fleece the resultant image. So, we do get the running state of the instance, even though we’re acquiring the data through glance.

As I mentioned in my previous reply, the biggest issue we currently have with OpenStack fleecing is the need to download the entire image to the ManageIQ appliance (server). So the primary need we have in this area is an enhancement to Glance to support reading image data based on offset and length.

Our experience in other environments, VMware for example - using VDDK, has shown that fleecing through such a mechanism is not only performant, but fairly scalable. Given the relatively low overhead and small footprint of the fleecing stack, we’re able to fleece a number of VMs simultaneously on a single ManageIQ appliance - irrespective of the VMs’ storage location or hypervisor.

Rich


#8

Yes, I agree that is a problem. What is the scalability on snappshotting thousands of instances and transferring them each to the server (even many at a time)? For instance, my Win 7 qcow2s are 25GB – imagine the network saturation and additional swift storage for snapping and copying a large environment of those. It is totally implausible that this could be done quickly and efficiently at scale. Even with Glance reading the offsets, you are still incurring a storage penalty. In my demo application I fleece the image on the hypervisor itself without downloading it, and without installing additional software (or agents) on that server (in seconds.) Wouldn’t this make much more sense?

Greg


#9

Greg,

What you have is access to the raw storage where the VM lives. If ManageIQ has access to the same, it can do the same kind of analysis as your script via libguestfs.

The main issue is getting access to the raw storage that libguestfs mounts via operating system and that ManageIQ mounts via its internal methods.

Oleg


#10

What you say is “totally implausible” has already been done in other environments. For VMware we routinely snapshot VMs before they are fleeced. One could argue that any live image should be fleeced from a snapshot, but that’s another topic. As is the various type of snapshots, their overhead and intended uses - and which should be implemented in OpenStack.

Reading offset and length eliminates the need to download the entire image, so the time it takes is based on the amount of data read as opposed to the size of the image. You say that you’re demo isn’t downloading the data, but it is. You’re reading the data via sshfs, which is transferring the data over the network. I would argue that the per-byte overhead of sshfs is greater than that of targeted reads over tcp. For VMware, core fleecing times are in the order of seconds, and we run multiple fleeces simultaneously.

Even if the snapshot aspect becomes untenable, that would be an argument for a true-cow snapshot facility that would only consume storage as needed and take little time to perform. In absence of such a facility, I would entertain using sshfs to access the disk files. Once we have access to the files however, I would just attach them to our existing fleecing stack, so there would be no need for additional scripts.

It seems to me we are trying to work around shortcomings in the environment in regard to support for fleecing. I think resources would be better spent addressing those shortcomings, as opposed to implementing a special case, orthogonal fleecing path for OpenStack.

Rich


#11

Rich,

I agree with you, and I believe you are saying that you agree with me that you can (and probably should) fleece the image through sshfs (as Glance has no method supporting offsets now.) Of course sshfs transfers some portion of the image, but my point is it doesn’t transfer the ENTIRE image as you previously described ManageIQ working today. That is something that I think we can all agree is a step in the right direction. Transferring every image to the ManageIQ box is going to come at a cost which could be otherwise avoided. VMWare and OpenStack are very different animals, and we can discuss both their shortcomings and benefits at length. :smile: However, the “marketing” around this project/product certainly suggests CloudForms/ManageIQ is a CMP for OpenStack (today, not some future OpenStack which doesn’t yet exist.) Also, to be frank, it’s this project which is implementing a “previously proprietary” one-off method of reading image data, not libguestfs. If you can do it faster, then it should be pushed into the guestfs source. Since it is doubtless from this thread that you guys don’t want to rewrite the process to be different than VMWare – what can I do to help you make it “more sane” in OpenStack.

BTW - Have you commited an upstream patch against Juno for implementing the offset based fetching to Glance?

Greg


#12

Greg,

I believe you are basing your arguments on assumptions which are different from Rich’s and mine.

Your assumption (and the code snippet you provided) work on the hypervisor which can see the storage for the VM it is fleecing. Because it can see the storage, it can mount it and utilize libguestfs.

ManageIQ runs as a VM. It does not have the same view of the storage for the VMs it is fleecing as in your example. Therefore, it cannot mount it and utilize libguestfs. AFAIK, OpenStack does not expose APIs to find the hypervisors and their VMs.

Therefore, ManageIQ utilizes the APIs at its disposal - currently the Glance API to get the ENTIRE image. I hear there is a pull request upstream to support offset and length in Glance.

If ManageIQ leveraged libguestfs inside its VM/appliance, nothing would change. libguestfs would not see anything more than ManageIQ code sees.

Please correct me if I am wrong in my understanding of the assumptions in each situation.

Oleg


#13

Oleg,

The example code can be run on any host, even a VM inside OpenStack. The sshfs portion takes care of making the image appear local (basically, mounting the remote hypervisor’s /var/lib/nova directory through ssh.) The libguestfs portion is called afterwards on the remote image, and it completes it’s fleecing without transferring the whole image. I’m trying to find a good way to calculate the bytes transfered when doing this – however FUSE filesystems don’t utilize /proc/diskinfo and can’t be seen with iostat. Even if you still use the same ManageIQ backend fleecing code (instead of libguestfs), wouldn’t it make more sense to access the image this way?

Greg


#14

Greg,

When you say “basically, mounting the remote hypervisor’s /var/lib/nova directory through ssh”, this is where I get confused.

  1. How would ManageIQ figure out the hypervisor of the VM/Instance we are trying to fleece in OpenStack?
  2. How would this work for an image, which has no hypervisor association on OpenStack?

Thanks,
Oleg


#15

That’s a comparative statement, but it’s not clear what you’re comparing it to.

Does it make more sense than downloading the entire image? Then I would say yes.
Does it make more sense than just reading the needed blocks through an API? Then I’d say no.

Next week, when I have more time, I’ll reply more fully. As you point out, I think we need to distinguish between initial implementation and future direction.

I have a number of questions, two of which already raised by Oleg in his latest reply.

Rich


#16

Guys,

Thanks! This has been an awesome discussion.

I think I found a way to do this in OpenStack we can agree with:

  1. Use Swift as the backend store for Glance
  2. Snapshot the image
  3. Read the bytes from the Swift API which supports specifying a specific block ranges (see here )
  4. Delete snapshot

The downside is requiring glance to be backended by swift, but using a swift backand adds a lot of benefits there too (image versioning, ha, distributed images over AZs or even other cloud providers, etc.)

What do you think?

As for my script, it will be fun to try to figure out Oleg’s first question. I’ll take a stab at that tonight. As for the second q, this is OpenStack specific so there shouldn’t be images which are somehow unassociated with OpenStack that it needs to find. From a technical point of view tho, any image you can point it to will be processed.

Greg


#17

Note: Please see my last post on swift block ranges (probably the best option currently to integrate with ManageIQ), but I still wanted to follow-up with an answer for Oleg’s question

You can track down the instances through the Nova api, I’ve updated the example code here and added a "-i " for locating a single instance in OpenStack and parsing it.

Greg


#18

You’re using the CLI. The question was, how to accomplish this through the API (meaning REST API) through the Fog gem for example.


#19

Rich,

Sure, I’ll recode for that. It is in an attribute for each instance called “OS-EXT-SRV-ATTR:hypervisor_hostname” (see the fog git here). I was having some issues connecting to OpenStack with fog last night (possibly related to an old hpfog on my system)

Greg


#20

Okay, the fog code for getting the “OS-EXT-SRV-ATTR:hypervisor_hostname” for an instance is added to the script. Essentially, to query this you do:

openstack = Fog::Compute.new(:provider     => 'openstack',
                       :openstack_auth_url => oscreds.auth_url,
                       :openstack_username => oscreds.username,
                       :openstack_api_key  => oscreds.password,
                       :openstack_tenant   => oscreds.tenant_name)
server = openstack.servers.get(instance)
hypervisor = server.os_ext_srv_attr_hypervisor_hostname