Smart State Analysis


#1

The following details the “Smart State” disk analysis subsystem (SSA) implemented as part of the CloudForms / ManageIQ Cloud Management Suite.

The majority of the logic resides in the ‘disk’, ‘fs’, and ‘VolumeManager’ directories under gems/pending in the manageiq project. Here various disk structures are parsed from raw images and are used to represent higher level filesystem constructs. The subsystem is exposed through the MiqVM interface which utilitizes these modules present the data to the client (the MiQ application).

The current workflow between the subsystems is modeled in the following diagram:

Here we see calls to MiqVM being dispatched to the various disk, VolumeManager, and fs modules. The representation of file systems present on the disk is built incrementally, with each layer of storage technology being applied to create the higher level constructs from those below it. Overall the process is summarized below:

  • MiqVM is initialized with path / specifiers of disks to analyze
  • Raw disk containers are parsed and encapsulated partitions passed to VolumeManager
  • VolumeManager constructs higher level block objs if applicable
  • All disks (physical and logical) are passed to fs subsystem which scans the block devices for root trees

Each layer of the SSA subsystem is implemented in a pluggable manner, allowing modules to be written for various disk, volume, and fs formats. Modules are responsible for disk entity detection as well as extracting and unpacking the necessary constructs necessary to represent the entities for consumption in layers above. Currently the following modules exist:

  • disk - parse partitions and blocks out of raw data streams

    • LargeFile (win32) - base file i/o class, uses a c-module internally to read data from the local fs
    • LocalDev - uses LargeFile to access of local /dev device
    • Azure - uses the scvmm api to access data on the Azure cloud
    • MSVS - uses LargeFile to extract / parse various Microsoft Virtual Server disk image
    • QCOW - uses LargeFile to parse qcow disk images
    • Raw - uses LargeFile to parse raw disk images
    • RHEV - parse RHEV disk images dispatching to contained disk type
    • VMWare - uses LargeFile to parse various VMWare disk images
    • VHDX - uses LargeFile to parse the VHDX disk image format
    • VIX - uses VIX api to parse vmware vsphere disks
  • volume - consolidates underlying disks into managed volumes and exposes them via the disk api

    • LVM - parses the Linux Logical Volume Manager disk format from the specified volumes
    • LDM - parses the Microsoft Logical Disk Manager disk format from the specified volumes
  • fs - scans specified disks for filesystem signatures and parses supporting metadata to expose the filesystem api

    • AUFS - tree based UnionFS filesystem
    • EXT 3/4 - inode / tree based filesystem with journal
    • FAT32 - microsoft’s table & bitmap based fs
    • NFS - network based filesystem access
    • ISO9660 - optical disk media filesystem
    • NTFS - microsoft’s tree based filesystem
    • Native - uses underlying OS mount operations to expose fs api for device
    • Real - exposes current underlying system fs
    • Reiser - tree / journal based filesystem
    • Union - original Union FS
    • WebDAV - web based filesystem
    • XFS - high performance / capacity, tree/journaling fs
    • ZFS - combined fs / volume manager

The intent is to refactor / extract each of these modules into seperate projects, repos, & gems so as to isolate functionality for testing and maintainability purposes. As we start looking to do this, it is helpful to model the filesystems via heirarchies of block constructs, with each layer consuming the blocks below it and using it’s internal algorithm to present its blocks above.

We can also use the abstraction of block I/O to represent SSA as a series of analysis / extraction steps where disk data is consumed, transformed, and then represented for subsequent consumtion.

In both models the interaction between layers / steps is standardized, via the internal block and filesystem interfaces, though this will need to be extracted / shared among modules that reside in seperate projects & repos. Though the benefit of this is a well defined disk parsing & interaction interface in Ruby, through which different layers can easily be implemented and tested.

To implement the SSA subsystem as seperate modules:

  • Everything from MiqVM and above would stay as-is as part of the current codebase
  • The current codebase would be cleaned up, various modules tidied for consolidation & extraction purporses & code standards enforced
  • Introduce abstract block & filesystem i/o interfaces & gems
  • Extract disk modules from LargeFile & up into their own projects using block i/o, pull into MiQ
  • Extract VolumeManager modules into their own projects using block i/o, pull into MiQ
  • Extract fs modules into their own projects using block & filesystem i/o, pull into MiQ
  • Finally extract the disk, volume, and filesystem loaders / central logic into their own independent gems

Tests can be implemented for each layer / component as they are extracted. Docs can be written pertaining to each format at hand and supported / not supported features. Finally the entire process can be used as a template to model the extraction of other gems & more from MiQ.


#2

Very helpful, thanks! :slight_smile:


#3

Very nice write-up, @mmorsi. A couple of comments:
The MSVS disk module is the VHD disk image handler. Both that and the VHDX module use the MiqHyperVDisk handler to access remote Microsoft disks (either VHD or VHDX) and only use LargeFile when parsing local disks.
While there is some code in the source tree for all the filesystems you list, several of them are not supported, including AUFS, Union FS, and ZFS.