Bullet for detecting n+1 queries in arbitrary blocks of ruby code


#1

bullet is a useful tool I have used in the past to profile code for N+1 queries. A good discussion of bullet and N+1 queries is found here

So, most bullet use cases out there involve hooking the rack/rails request/response cycle for a web site. In other words, they detect N+1 queries that occur in a single request/response cycle. That’s useful for our UI/Rest API but we have lots of code that runs outside of this cycle.

I had used this code in a vmdb/config/initializers/bullet.rb file to enable profiling of arbitrary blocks. Note: I doesn’t get enabled in production mode and I ran this code last many months ago so it might need some tweaking to work with latest bullet.

unless Rails.env.production?
   if defined?(Bullet)
     Bullet.enable = true
     Bullet.bullet_logger = true
     Bullet.alert = true
   end

  def with_bullet
    require 'bullet'
    Bullet.enable = true
    Bullet.rails_logger = true

    raise "requires block" unless block_given?
    Bullet.start_request
    yield
    Bullet.perform_out_of_channel_notifications if Bullet.notification?
    Bullet.end_request
  end
end

With this, I could then run Vm.some_really_slow_method in rails console and it would print any detected N+1 queries in the rails log.

Perhaps it would be useful for others and even be something we can propose upstream in bullet.

It might be neat to hook this type of thing for detecting slow tests using rspec’s before/after blocks.


#2

Just a word of caution.

Don’t use unit tests to drive bullet. And run it in production mode on your local machine.

I’ve used bullet to speed up unit tests just to slow down production by pulling back too much data.


#3

+1 @kbrock… with great power comes great responsibility. All of these tools provide information that we need to really understand before we blindly change things. Optimizing for the 1% worst case is not a good idea.


#4

@jrafanie I would be interested in developing a methodology for baselining/benchmarking number of queries for the backend processes, however I haven’t found a clear solution to doing so. I’ve done some analysis on UI transactions via turning on Rails debug and viewing the number of queries (cached and uncached) tied to a page load. I’ve done this analysis by tying the UI Worker PID to the DEBUG statements in the log file displaying the queries tied to the specific UI transaction.

I assume I could do something very similar for backend processes, it would be more of a matter of being able to consume the amount of log output ManageIQ can generate when we start generating a workload on ManageIQ. Perhaps there is a better method that we could log # of queries per message aside from just the worker dequeue/delivery timings without having to do post-processing of log files.


#5

@akrzos Check out ActiveRecord::TestCase#assert_queries which uses
ActiveRecord::SQLCounter

I have a vmdb/config/initializers/sql_query_counter.rb with some local changes for mysql and oracle schema related queries to ignore and some other logic for choosing to selectively ignore some queries.

I haven’t run this in many months, but with this initializer, I can run assert_queries(10) { SomeClass.new.method_with_lots_of_queries } in rails console and it yells at me if the the query counts don’t match.

It could be easily changed to yell if > x number of queries occur and could be used in combination with a Rails runner script to run a specific task and verify the query count.

Here is the contents of my sql_query_counter.rb initializers file that was borrowed from the Rails code above:

if !Rails.env.production? && defined?(ActiveSupport::Notifications)
  # lifted from lib/active_record/test_case.rb
  def assert_queries(num = 1, options = {})
    ignore_none = options.fetch(:ignore_none) { num == :any }
    SQLCounter.clear_log
    yield
  ensure
    the_log = ignore_none ? SQLCounter.log_all : SQLCounter.log
    if num == :any
      STDERR.puts("1 or more queries expected, but none were executed.") if the_log.size >= 1
    else
      mesg = "#{the_log.size} instead of #{num} queries were executed.#{the_log.size == 0 ? '' : "\nQueries:\n#{the_log.join("\n")}"}"
      STDERR.puts(mesg) unless num == the_log.size
    end
  end

  def assert_no_queries(&block)
    assert_queries(0, :ignore_none => true, &block)
  end

  class SQLCounter
    class << self
      attr_accessor :ignored_sql, :log, :log_all
      def clear_log; self.log = []; self.log_all = []; end
    end

    self.clear_log

    self.ignored_sql = [/^PRAGMA (?!(table_info))/, /^SELECT currval/, /^SELECT CAST/, /^SELECT @@IDENTITY/, /^SELECT @@ROWCOUNT/, /^SAVEPOINT/, /^ROLLBACK TO SAVEPOINT/, /^RELEASE SAVEPOINT/, /^SHOW max_identifier_length/, /^BEGIN/, /^COMMIT/]

    # FIXME: this needs to be refactored so specific database can add their own
    # ignored SQL, or better yet, use a different notification for the queries
    # instead examining the SQL content.
    oracle_ignored     = [/^select .*nextval/i, /^SAVEPOINT/, /^ROLLBACK TO/, /^\s*select .* from all_triggers/im]
    mysql_ignored      = [/^SHOW TABLES/i, /^SHOW FULL FIELDS/]
    postgresql_ignored = [/^\s*select\b.*\bfrom\b.*pg_namespace\b/im, /^\s*select\b.*\battname\b.*\bfrom\b.*\bpg_attribute\b/im]

    [oracle_ignored, mysql_ignored, postgresql_ignored].each do |db_ignored_sql|
      ignored_sql.concat db_ignored_sql
    end

    attr_reader :ignore

    def initialize(ignore = Regexp.union(self.class.ignored_sql))
      @ignore = ignore
    end

    def call(name, start, finish, message_id, values)
      sql = values[:sql]

      # FIXME: this seems bad. we should probably have a better way to indicate
      # the query was cached
      return if 'CACHE' == values[:name]

      self.class.log_all << sql
      self.class.log << sql unless ignore =~ sql
    end
  end

  ActiveSupport::Notifications.subscribe('sql.active_record', SQLCounter.new)
end


#6

@akrzos Note, I haven’t looked at Rails 4 code in this area to see if this code would be compatible when we get to rails 4.


#7

I wonder if this will just produce brittle tests, kinda like the tests we have now that say there should be 34 records. When something changes, the test breaks and we blindly update to be 35 records.

What if we put in benchmark tests to ensure a particular controller runs within a certain amount of time. Would be cool to have a state of the union “Our 5 highest traffic pages are taking 2 seconds in total” And aim to get that number to decrease. #measure #all #things

When people add more features, then it will become evident that there is a tradeoff.


#8

@kbrock @akrzos My thought was a benchmarking tool that could be developed to run SQLCounter.clear_log and ActiveSupport::Notifications.subscribe('sql.active_record', SQLCounter.new) in a worker or process we want to profile, do the “suspect” task(s), report the log of queries, then unsubscribe the notification. I’m sure there are more details that could be found and improved on once that benchmark was started.


#9

@jrafanie

right, but the count of the number of queries just tells you that there was a change.

It seems a black box test of just saying “you have 0.1 seconds to perform this task” would be less brittle and probably convey the true intent better.

If someone adds a query and it brakes, they’ll just up the counter.
If someone adds a query and it takes too long, I don’t think they’ll be so quick to saying, it is ok if this takes longer than 0.1 seconds. It is not their call.


#10

Yeah @kbrock. Query counts is just one metric that @akrzos mentioned wanting to have. Query counts, wall time, cpu time, GC time, MiqQueue get and deliver time, and many more are all valuable to dig deeper into issues.

We have used query counts in the past to identify N+1 queries and related bad performing queries…before things like New Relic and bullet became useable. Even still those are tied to web request/response hooks which is unusable with workers, hence my mention of how to use bullet outside of those hooks.

Stats are all good for giving us insight into what’s going on.

We have to have the information before we can make decisions about how to improve things.


#11

@jrafanie New Relic works for workers as well.
Most of them probably do.

I really enjoyed Nunemaker’s article on ActiveSupport notifications:

http://www.railstips.org/blog/archives/2013/01/23/an-instrumented-library-in-30-lines/