Debugging DRb Recycled Object error with Automate


#1

Automate methods are run using DRb using a client/server model. The Automation Engine is the server and the Automate method is the client. They exchange objects over the DRb channel (http), objects that cannot be Marshaled and dumped from the server side are sent as references (undumped) to the client.
When the client tries to access the reference objects, they need to be available on the server side. Previously we used to disable the garbage collection on the server side when the automate method was running so that we don’t inadvertently garbage collect the object. This leads to the Recycled object error in the automate method.

Starting with Bottvinik we have removed the disabling of GC during the execution of Automate methods, since most of the objects that we sent via reference to the automate method via DRb are cached which prevents them from being garbage collected. There have been instances where some of the Ruby hashes use a default proc for initializing hash values. These hashes cannot be dumped to the client and DRb sends over a reference to the object, but since these hashes don’t get tucked in the cache on the server side they get garbage collected and we see this intermittent Recycled object error.

The following code snippet allows for debugging of the recycled objects.
Copy this into a file called drb_recycling.rb under vmdb/config/initalizers


require 'drb'
module DRb 
    class DRbMessage
      def dump(obj, error=false)
        orig_obj = obj 
         obj = make_proxy(obj, error) if obj.kind_of? DRbUndumped
  
         $miq_ae_logger.info("DRB orig_obj type: #{orig_obj.class.name}, undumped?: #{orig_obj.kind_o  f?(DRbUndumped)}, object_id: #{orig_obj.object_id} (0x#{orig_obj.object_id.to_s(16).rjust(14, '0')  }), obj type:#{obj.class.name}, object_id: #{obj.object_id} (0x#{obj.object_id.to_s(16).rjust(14,   '0')})")
  
        begin
          str = Marshal::dump(obj)
          rescue Exception => e
          $miq_ae_logger.error("Marshall Exception #{e}")
          $miq_ae_logger.error("Object #{obj.inspect}")
          $miq_ae_logger.error("call stack #{caller.take(25)}")
          orig_obj = obj 
          obj = make_proxy(obj, error)
  
 
         $miq_ae_logger.info("DRB(rescue) orig_obj type: #{orig_obj.class.name}, undumped?: #{orig_  obj.kind_of?(DRbUndumped)}, object_id: #{orig_obj.object_id} (0x#{orig_obj.object_id.to_s(16).rjust(14, '0')}), proxy obj type:#{obj.class.name}, object_id: #{obj.object_id} (0x#{obj.object_id.to_s(16).rjust(14, '0')})")
 
          str = Marshal::dump(obj)
        end
        [str.size].pack('N') + str
    end 
  end
end

After this file is created
restart the evm server
Run your tests, the above changes will log every object being sent across from the server side to the automate method along with their ids. When we get a recycled object error on the Automate method side it typically has an object id in hex. This object id can be searched in the logs to see which kind of object it is. You might need additional debugging lines in the automate method to see which method call returned that object.