GlusterFS Research / Overview


#1

The following in a high level overview of the Gluster Scalable Network Filesystem for reference & exploring the possibility of adding support for it to MiQ.

Gluster - Scalable Network Filesystem

Incorporates rapid provisioning & automatic failover w/out a central metadata server

Brick is the basic unit of storage, represented by an export directory on a server in the trusted storage pool (cluster).

A cluster is a group of linked computers, working together closely

On all servers in storage pool:

yum install glusterfs-server
systemctl start glusterd.service

The ‘gluster’ command used to interface w/ & manage server

  • volume info [all|]
  • volume create [stripe ] [replica ] [transport <tcp|rdma|tcp,rdma>] …
  • volume delete
  • volume start
  • volume stop [force]
  • volume rename
  • volume set [ ]
  • volume help
  • volume add-brick …
  • volume remove-brick …
  • volume rebalance-brick ( ) start
  • volume rebalance stop
  • volume rebalance status
  • volume replace-brick ( ) start|pause|abort|status|commit
  • volume log filename [BRICK]
  • volume log locate [BRICK]
  • volume log rotate [BRICK]
  • peer probe
  • peer detach
  • peer status
  • peer help

Mounting storage from pool:

gluster peer probe server2
gluster peer probe server1
gluster volume create gv0 replica 2 server1:/data/brick1/gv0 server2:/data/brick1/gv0
gluster volume start gv0
mount -t glusterfs server1:/gv0 /mnt
  for i in `seq -w 1 100`; do cp -rp /var/log/messages /mnt/copy-test-$i; done
gluster volume set gv0 performance.cache-size 256MB

Volume Types:

  • Distributed - Distributed volumes distributes files throughout the bricks in the volume.
  • Replicated – Replicated volumes replicates files across bricks in the volume.
  • Striped – Striped volumes stripes data across bricks in the volume.
  • Distributed Striped - Distributed striped volumes stripe data across two or more nodes in the cluster.
  • Distributed Replicated - Distributed replicated volumes distributes files across replicated bricks in the volume.
  • Distributed Striped Replicated – Distributed striped replicated volumes distributes striped data across replicated bricks in the cluster.
  • Striped Replicated – Striped replicated volumes stripes data across replicated bricks in the cluster.
  • Dispersed - Dispersed volumes are based on erasure codes, providing space-efficient protection against disk or server failures.
  • Distributed Dispersed - Distributed dispersed volumes distribute files across dispersed subvolumes.

Data Structures:

struct _inode_table {
      pthread_mutex_t    lock;
      size_t             hashsize;    /* bucket size of inode hash and dentry hash */
      char              *name;        /* name of the inode table, just for gf_log() */
      inode_t           *root;        /* root directory inode, with inode
      number and gfid 1 */
      xlator_t          *xl;          /* xlator to be called to do purge and
      the xlator which maintains the inode table*/
      uint32_t           lru_limit;   /* maximum LRU cache size */
      struct list_head  *inode_hash;  /* buckets for inode hash table */
      struct list_head  *name_hash;   /* buckets for dentry hash table */
      struct list_head   active;      /* list of inodes currently active (in an fop) */
      uint32_t           active_size; /* count of inodes in active list */
      struct list_head   lru;         /* list of inodes recently used.
                                         lru.next most recent */
      uint32_t           lru_size;    /* count of inodes in lru list  */
      struct list_head   purge;       /* list of inodes to be purged soon */
      uint32_t           purge_size;  /* count of inodes in purge list */

      struct mem_pool   *inode_pool;  /* memory pool for inodes */
      struct mem_pool   *dentry_pool; /* memory pool for dentrys */
      struct mem_pool   *fd_mem_pool; /* memory pool for fd_t */
      int                ctxcount;    /* number of slots in inode->ctx */
};

struct _inode {
      inode_table_t       *table;         /* the table this inode belongs to */
      uuid_t               gfid;          /* unique identifier of the inode */
      gf_lock_t            lock;
      uint64_t             nlookup;
      uint32_t             fd_count;      /* Open fd count */
      uint32_t             ref;           /* reference count on this inode */
      ia_type_t            ia_type;       /* what kind of file */
      struct list_head     fd_list;       /* list of open files on this inode */
      struct list_head     dentry_list;   /* list of directory entries for this inode */
      struct list_head     hash;          /* hash table pointers */
      struct list_head     list;          /* active/lru/purge */

      struct _inode_ctx   *_ctx;          /* place holder for keeping the
      information about the inode by different xlators */
};

// file / directory entry
struct _dentry {
      struct list_head   inode_list;   /* list of dentries of inode */
      struct list_head   hash;         /* hash table pointers */
      inode_t           *inode;        /* inode of this directory entry */
      char              *name;         /* name of the directory entry */
      inode_t           *parent;       /* directory of the entry */
};

libgfapi-ruby example:

require 'glusterfs'

# Create virtual mount
volume = GlusterFS::Volume.new('my_volume')
volume.mount('1.2.3.4')

# Create a new directory
dir = GlusterFS::Directory.new(volume, '/some_dir')
dir.create

# Create a file from string or bytes
file = GlusterFS::File.new(volume, '/gfs/file/path')
size = file.write(data)
puts "Written #{size} bytes"

# Copy an existing file to gluster
existing_file = File.open('/path/to/file')
file = GlusterFS::File.new(volume, '/gfs/file/path')
size = file.write_file(existing_file)
puts "Written #{size} bytes"

# Read a file
file = GlusterFS::File.new(volume, '/gfs/file/path')
contents = file.read

# Unmount virtual mount
volume.unmount

Linkage:
- http://en.wikipedia.org/wiki/GlusterFS
- http://www.gluster.org/
- http://gluster.readthedocs.org/en/latest/
- https://github.com/spajus/libgfapi-ruby