Can we speed up small repos on Travis?


#1

A great outcome of the repo splits is that provider repos now have small test suites.
E.g. rake in manageiq-providers-kubernetes takes ~1m20s locally.
Yet on Travis it takes >7min!

That means that Travis spends most of its time doing something other than running tests!
=> Can shave off any overheads?

  • 21s for rvm. “ruby-2.3.3 is not installed - installing”, downloads 93M binaries tarball from S3. Surely this could be cachable?
  • 211s bin/setup. bundler Fetching ~23 github repos, then >200 rubygems.
    • does Travis have any git cloning delta-optimization?
    • can the gems be cached?
    • bundler recently got a parallel install flag, are we using it?
  • 149s running rake. rspec reports “Finished in 1 minute 33.8 seconds (files took 8.54 seconds to load)”, where did >50s go?
    • Submitting to Coveralls? BTW “[Coveralls] Couldn’t find a repository matching this job.”, that doesn’t sound good.
      Anyway, if Coveralls is the culprit (don’t have timing, just don’t see anything else in log), can we make Travis go green/red before submitting?

#2

A few comments and additional suggestions:

  • I think we probably could find a way to disable building assets when running bin/setup on some provider repos, probably via an ENV var. While done in parallel in a Thread in bin/setup, this is probably taking up a large chunk of network bandwidth that downloads all of the assets, and is never going to be used when running the tests.
  • Bundler, when installing git gems, already has optimizations built in to how much it clones when it pulls in git gems, so the optimizations for that are already in place from there. I will save the explanation, but don’t really think there is much to improve on that front.
  • Travis does have a bundler cache, but we might have to manually activate it if we are running bundler through bin/setup. I would have to look. That said, with how fast the git gems change, that probably will always be an invalid cache which stinks (it just uses a MD5 sum to figure out the if the tar’d cache is valid)… we might be able to cache the non-git gems only though, but again, need to do more research on how the caching is handled and how to accomplish that.
  • The parallel flag might help with bundler, but it probably will only help so much, since I think we only get to concurrent processes per Travis worker (this is a public CI, so they do have restrictions).
  • While not logged here, code-climate might also be contributing to the rake time as well.

downloads 93M binaries tarball from S3. Surely this could be cachable?

Well, thing is, that probably is the cached version of that… so not sure there is much more we can do. The docker images that Travis uses are probably as light weight universal as possible, and I don’t think you can create your own to use with the sudo: false config (you can run docker in the VM based infrastructure, but I don’t think that would save us time since our docker image is 2+ gigs…).

Anyway, if Coveralls is the culprit (don’t have timing, just don’t see anything else in log), can we make Travis go green/red before submitting?

Based on the Travis Build Lifecycle, you would think that after_script would work for it, but I don’t think it can used to speed up how quickly the report is sent to coveralls (based on this), so I think you are out of luck if that is indeed the culprit.

I would say optimizing bin/setup is the best course of action as it has the most to potential savings available. Trying to speed up rvm in this case probably is a lost cause, and wouldn’t yield to much in build time, and unless you know specifically what is culprit.


#3

Within the past few weeks, there is no longer a default ruby installed in the base Docker images. This is slowing down Travis builds by a bunch (the rvm line) - Think we can’t really change that number too much besides give feedback to TravisCI.

I think Nick and you pointed out that the biggest cost and probably the biggest savings can be obtained from bin/setup.

That script used to be much simpler – a shell script. Over time it has gotten quite complicated and become a ruby script. A month or two ago I wanted to remove something that was run twice, but it was too difficult and easier to just punt. Wish I remembered more about what I thought needed to be tweaked.

I still feel the git ruby gems are a problem, but Nick is probably right that with the amount of change, the git update route is probably quicker.

If we are splitting out various repos, it would seem we could reduce the number of necessary gems.

I wonder if our database setup could be simplified. For many gems, we do not setup a database, but for others, it may be our issue. Seeding is still quite slow and very query intensive.

Don’t think Coveralls is a big time sink. Curious about just getting that data from CodeClimate and not using coveralls.


#4

So, we already do have bundler gem caching enabled, but it looks like it’s not working properly. We should fix that ASAP.

Compare https://travis-ci.org/ManageIQ/manageiq/jobs/253211871#L269 to https://travis-ci.org/ManageIQ/manageiq-providers-kubernetes/jobs/253200644#L261

Seems like that is not enabled either and I have no idea why…I could have sworn it was there. I’ll make a PR for that.

We’ve switched to codeclimate-test-reporter in other repos, and probably will here as well. I’m surprised it’s not sending currently but that’s probably because we haven’t enabled it on coveralls for some reason. I’ll take a look at this.


#5

BTW base ubuntu image will upgrade soon (or can opt in/out)


#6

@cben https://github.com/ManageIQ/manageiq/pull/15585 should fix all of

  • parallel install
  • travis retries on failure
  • bundler gem caching