Azure [Validating New VM] Timeout

fine
providers

#1

I’m using automate to provision a new Azure VM on a new fine-3 appliance, though this occurs when simply provisioning a VM from Compute -> Clouds -> Instances as well. It reaches the timeout value after 100 retries, though the VM does provision successfully.

Looking in evm.log I can see each failed call to poll_destination_in_vmdb, and the ems_ref has the correct values. I do not experience this issue when provisioning from Automate with VMware

Things I’m doing which might cause this:

  • I have 4 Azure and 3 VMware vCenter providers on this box. (16 cores, 56 GB RAM)
  • I am overwriting clone_options entirely in the Azure pre-provision stage, because I need some functionality which hasn’t made it to fine yet (market images, availability sets, setting license type on Windows machines, etc)

Thanks


#2

After digging through System -> Event -> EmsEvent in automate, it looks like many actions in azure trigger a refresh of the EMS, and since refreshing an Azure provider can take several hours (I have another issue open for performance issues in get_stack_templates), and since I see a refresh being triggered for the provider I’m attempting to provision to approximately every ten minutes, none of which have completed, I think the issue here is that the instances are never refreshed because these queued refresh tasks get stuck in get_stack_templates. Thoughts?

For the record, this provider has around 300 deployment templates when it does eventually refresh. I have providers in new subscriptions which have around 10, and those refresh in minutes.


#3

I decided to stop being an idiot and put in some additional logging in ems_inv_to_hashes, and it turned out that the refresher was hanging on get_images. I did a ‘patch’ and just commented that out on my appliance and all seems to be well now.


#4

The code for getting private images on unmanaged storage is not pretty because Azure gives us no API. Consequently we must iterate over every container on every storage account looking for them.

I’ve got a faraday branch in the works for azure-armrest. I’m hopeful that it will improve matters.


#5

I am wondering, how do you overwrite clone_options in pre-provision stage to handle availability sets? This would be very useful for me, but I am having trouble finding the name, and the Azure pre-provision stage only has a couple of lines in it.


#6

So, the code which allows one to overwrite clone_options is found here in the MIQ source code. That calls a function called prepare_for_clone_task, which is defined by each provider that supports cloning from a template. For Azure in particular, that definition is here.

You’ll notice that the prepare_for_clone_task method returns a hash that is the same as the hash expected by the Azure API, because that is PUT to the Azure API. Since clone_options is merged after the provider specific method is called, you can simply overwrite it by supplying the entire hash sometime before the provision state machine


#7

So, with availability sets as an example, you’re literally doing something like the following in / Cloud / VM / Provisioning / StateMachines / Methods / PreProvision ?

prov.cloud_options[:properties][:availabilitySet] = { “id”: “/subscriptions/{subscription-id}/resourceGroups/myresourcegroup/providers/Microsoft.Compute/availabilitySets/myavset” }


#8

Merge does not perform a deep merge, so if you do this you must overwrite the entire hash, i.e. you must set it according to the Azure VM REST API


#9

Which hash?


#10

Would you be willing to spare me the trouble and post one sanitized example where you set the availability set?


#11

Definitely, but during business hours tomorrow


#12

I appreciate it. I expect its my inexperience in breaking down data structures.


#13

def log(level, msg, update_message=false)
  $evm.log(level,"#{msg}")
  @task.message = msg if @task.respond_to?('message') && update_message
end

def dump_root()
  $evm.log(:info, "Begin $evm.root.attributes")
  $evm.root.attributes.sort.each { |k, v| log(:info, "\t Attribute: #{k} = #{v}")}
  $evm.log(:info, "End $evm.root.attributes")
  $evm.log(:info, "")
end

def add_security_groups()
  # ensure that the security group is set
  log(:info, "Processing add_security_groups...", true)
  ws_values = @task.options.fetch(:ws_values, {})
  if @task.get_option(:security_groups).blank?
    security_group_id   = @task.get_option(:security_groups) || ws_values[:security_groups] rescue nil
    security_group_id ||= @task.get_option(:security_groups_id) || ws_values[:security_groups_id] rescue nil
    unless security_group_id.nil?
      security_group = $evm.vmdb(:security_group).find_by_id(security_group_id)
    end
    if security_group
      log(:info, "Using security_group: #{security_group.name} id: #{security_group.id} ems_ref: #{security_group.ems_ref}")
      @task.set_option(:security_groups, [security_group.id, security_group.name])
      log(:info, "Provisioning object updated {:security_group => #{@task.options[:security_groups].inspect}}")
    end
  end
  log(:info, "Processing add_security_groups...Complete", true)
end

def add_key_pair()
  # ensure that the key_pair is set
  log(:info, "Processing add_key_pair...", true)
  ws_values = @task.options.fetch(:ws_values, {})
  if @task.get_option(:guest_access_key_pair).blank?
    key_pair_id   = @task.get_option(:guest_access_key_pair) || ws_values[:guest_access_key_pair] rescue nil
    key_pair_id ||= @task.get_option(:guest_access_key_pair_id) || ws_values[:guest_access_key_pair_id] rescue nil
    key_pair_id ||= @task.get_option(:key_pair) || ws_values[:key_pair_id] rescue nil
    unless key_pair_id.nil?
      key_pair = $evm.vmdb(:auth_key_pair_amazon).find_by_id(key_pair_id) || $evm.vmdb(:auth_key_pair_amazon).find_by_name(key_pair_id)
    end
    if key_pair
      log(:info, "Using key_pair: #{key_pair.name} id: #{key_pair.id} ems_ref: #{key_pair.ems_ref}")
      @task.set_option(:guest_access_key_pair, [key_pair.id, key_pair.name])
      log(:info, "Provisioning object updated {:guest_access_key_pair => #{@task.options[:guest_access_key_pair].inspect}}")
    end
  end
  log(:info, "Processing add_key_pair...Complete", true)
end

def parse_market_urn(urn)
  keys = %w(publisher offer sku version)
  Hash[keys.zip(urn.split(':'))]
end

def parse_storage_type(size)
  if size =~ /\w[Ss]\d{1,2}|\w\d{1,2}[Ss]/
    'Premium_LRS'
  else
    'Standard_LRS'
  end
end

def set_market_image(options)
  log(:info, "Processing set_market_image...", true)
  ws_values = @task.get_option(:ws_values)
  unless ws_values[:market_image].blank?
    image_ref = parse_market_urn(ws_values[:market_image])
    options[:properties][:storageProfile][:imageReference] = image_ref
    options[:properties][:storageProfile][:osDisk][:managedDisk] = {
      :storageAccountType => parse_storage_type(options[:properties][:hardwareProfile][:vmSize])
      }
    log(:info, "Clone options object updated {:imageReference => #{options[:properties][:storageProfile][:imageReference]}")
    log(:info, "Clone options object updated {:managedDisk => #{options[:properties][:storageProfile][:osDisk][:managedDisk]}")
  end
  log(:info, "Processing set_market_image...Complete", true)
end

def set_availability_set(options)
  log(:info, "Processing set_availability_set...", true)
  ws_values = @task.get_option(:ws_values)
  unless ws_values[:availability_set].blank?
    options[:properties][:availabilitySet] = {
      :id => "#{ws_values[:availability_set]}"
      }
    log(:info, "Clone options object updated {:availabilitySet => #{options[:properties][:availabilitySet]}")
  end
  log(:info, "Processing set_availability_set...Complete", true)
end

def add_nics(options)
  log(:info, "Processing add_nics...", true)
  
  nis = Azure::Armrest::Network::NetworkInterfaceService.new(@azure)
  
  rg = $evm.vmdb(:resource_group).find_by_id(@task.get_option(:resource_group))
  nic_options = {
    :name => "#{@task.get_option(:vm_name)}_nic0",
    :location => @provider.provider_region,
    :properties => {
      :ipConfigurations => [
        {
          :name => @task.get_option(:vm_name),
          :properties => {
            :subnet => { :id => $evm.vmdb(:cloud_subnet).find_by_id(@task.get_option(:cloud_subnet)).ems_ref }
            }
          }
        ]
      }
    }
  
  nic = nis.create("#{@task.get_option(:vm_name)}_nic0", rg.name, nic_options)
  options[:properties][:networkProfile][:networkInterfaces] << {:id => nic.id}
  log(:info, "Clone options object updated {:dataDisks => #{options[:properties][:networkProfile][:networkInterfaces]}")
  log(:info, "Processing add_nics...Complete", true)
end

def set_tags(options)
  log(:info, "Processing set_tags...", true)
  
  options[:tags] = {
    'ApplicationName' => '',
    'BuildDate' => Time.now.utc.strftime('%Y-%m-%d'),
    'BusinessFunction' => '',
    'BusinessOwner' => '',
    'Creator' => @request.requester.email,
    'Supervisor' => '',
    'Manager' => ''
    }
  
  log(:info, "Processing set_tags...Complete", true)
end

def add_disks(options)
  log(:info, "Processing add_disks...", true)
  ws_disks = []
    if @task.options.has_key?(:ws_values)
      ws_values = @task.options[:ws_values]
      # :ws_values=>{:add_disk1 => '20', :add_disk2=>'50'}
      ws_values.each {|k,v| ws_disks[$1.to_i] = v.to_i if k.to_s =~ /add_disk(\d*)/}
      ws_disks.compact!
    end
    if ws_disks.blank?
      # @task.options=>{:add_disk1 => '20', :add_disk2=>'50'}
      @task.options.each {|k,v| ws_disks[$1.to_i] = v.to_i if k.to_s =~ /add_disk(\d*)/}
      ws_disks.compact!
    end

    unless ws_disks.blank?
      lun_start_idx = 1
      
      ws_disks.each_with_index do |size_in_gb, idx|
        next if size_in_gb.zero?
        options[:properties][:storageProfile][:dataDisks] << {
          :name => "#{@task.get_option(:vm_name)}_dataDisk#{idx}",
          :lun => lun_start_idx + idx,
          :diskSizeGB => size_in_gb,
          :createOption => 'empty',
          :managedDisk => { 
            :storageAccountType => parse_storage_type(options[:properties][:hardwareProfile][:vmSize])
            }
          }
      end
      log(:info, "Clone options object updated {:dataDisks => #{options[:properties][:storageProfile][:dataDisks]}")
    end
  log(:info, "Processing add_disks...Complete", true)
end

def set_license_type(options)
  log(:info, "Processing set_license_type...", true)
  ws_values = @task.get_option(:ws_values)
  if ws_values[:market_image].downcase.include? 'windows'
    license_type = 'Windows_Server'
    options[:properties][:licenseType] = license_type
  	log(:info, "Clone options object updated {:licenseType => #{options[:properties][:licenseType]}")
  else
    # This is here against the day we need to set Windows_Client
    license_type = nil
  end
  log(:info, "Processing set_license_type...Complete", true)
end

def set_os_profile(options)
  log(:info, 'Processing set_os_profile...', true)
  ws_values = @task.get_option(:ws_values)
  if ws_values[:market_image].downcase.include? 'windows'
    options[:properties][:osProfile][:windowsConfiguration] = {
      :provisionVMAgent => true,
      :winRM => {
        :listeners => [
          {
            :protocol => 'http'
          }
        ]
      }
    }
  else
    options[:properties][:osProfile][:linuxConfiguration] = {
      :disablePasswordAuthentication => false
    }
  end
  log(:info, "Clone object updated {:osProfile => #{options[:properties][:osProfile]}}")
  log(:info, 'Processing set_os_profile...Complete', true)
end

def generate_clone_options()
  credentials = $evm.instantiate('/Configuration/MachineCredentials/Azure/Credentials/admin_account')
  log(:info, 'Processing generate_clone_options...', true)
  clone_options = {
    :properties => {
      :hardwareProfile => {
        :vmSize => $evm.vmdb(:flavor).find_by_id(@task.get_option(:instance_type)).name
        },
      :osProfile => {
        :adminUserName => @task.get_option(:root_username),
        :adminPassword => credentials.decrypt('password'),
        :computerName => @task.get_option(:vm_name)
        },
      :storageProfile => {
        :osDisk => {
          :name => "#{@task.get_option(:vm_name)}_osDisk",
          :createOption => 'fromImage',
          :caching => 'ReadWrite',
          :diskSizeGB => 128
          },
        :dataDisks => []
        },
      :networkProfile => {
        :networkInterfaces => []
        }
      }
    }
  set_os_profile(clone_options)
  set_availability_set(clone_options)
  set_market_image(clone_options)
  add_nics(clone_options)
  add_disks(clone_options)
  log(:info, 'Processing generate_clone_options...Complete', true)
  clone_options
end

begin
  require 'azure-armrest'
  require 'pp'

  ###############
  # Start Method
  ###############
  log(:info, "ManageIQ Automate Method Started", true)
  dump_root()

  # Get provisioning object
  @task     = $evm.root['miq_provision']
  @request  = @task.miq_request
  log(:info, "Provisioning ID:<#{@task.id}> Provision Request ID:<#{@task.miq_provision_request.id}> Provision Type: <#{@task.provision_type}>")

  @template  = @task.vm_template
  @provider  = @template.ext_management_system
  
  @azure = Azure::Armrest::Configuration.new(
    :client_id => @provider.authentication_userid,
    :client_key => @provider.authentication_password,
    :tenant_id => @provider.uid_ems,
    :subscription_id => @provider.subscription
    )
  
  clone_options = generate_clone_options
  @task.set_option(:clone_options, clone_options)
  
  # Log all of the options to the automation.log
  @task.options.each { |k,v| log(:info, "Provisioning Option Key(#{k.class}): #{k.inspect} Value: #{v.inspect}") }
  
  log(:info, "Task Options: #{@task.options.pretty_inspect}")

  ###############
  # Exit Method
  ###############
  log(:info, "ManageIQ Automate Method Ended", true)
  exit MIQ_OK

  # Set Ruby rescue behavior
rescue => err
  log(:error, "[(#{err.class})#{err}]\n#{err.backtrace.join("\n")}")
  @task.finished("#{err}") if @task && @task.respond_to?('finished')
  exit MIQ_ABORT
end

#14

That’s a big chunk of code to just look at, but the meat of it is generate_clone_options. You can see that it generates a hash according to the Azure API link I posted yesterday


#15

Ah ha, you weren’t kidding… you’re not just changing data values, you’re pulling objects out and manipulating… I see. Most helpful, gets me exactly where I need to be… thank you very much :slight_smile:


#17

Looks like an error in above:

[----] E, [2017-10-28T00:42:00.530059 #43469:815a10] ERROR – : Q-task_id([miq_provision_99]) Class [Configuration/MachineCredentials/Azure/Credentials] not found in MiqAeDatastore
[----] I, [2017-10-28T00:42:00.530660 #43469:815a10] INFO – : Q-task_id([miq_provision_99]) Processing generate_clone_options…
[----] E, [2017-10-28T00:42:00.674513 #43469:815a10] ERROR – : Q-task_id([miq_provision_99]) [(NoMethodError)undefined method decrypt' for nil:NilClass] /Rentpath/Cloud/VM/Provisioning/StateMachines/Methods/PreProvision:225:ingenerate_clone_options’
/Rentpath/Cloud/VM/Provisioning/StateMachines/Methods/PreProvision:276:in `’


credentials = $evm.instantiate(’/Configuration/MachineCredentials/Azure/Credentials/admin_account’)

:adminPassword => credentials.decrypt(‘password’),


#18

You probably don’t have all the same stuff setup that I do, like a service catalog item which populates the provision request with all the values that are usable by this method or, in this specific case, an Automation domain called Configuration that contains a namespace called MachineCredentials, which contains a namespace called Azure, a class called Credentials, and an instance called admin_account.

I would really recommend reading this book by Peter McGowan if you haven’t already. Additionally this repo contains a lot of very useful code.