Join table convention in ManageIQ


#1

Hi! Let’s talk about :sparkles:join tables:sparkles:

In the Ideal Rails World®

Let’s say we have a Host model and a Storage model. In Rails, Active Record looks for conventionally named table names based on the names of these models. The table names would be hosts and storages, respectively:

class Host < ApplicationRecord
end

class Storage < ApplicationRecord
end

# Host.table_name
#   => "hosts"
# Storage.table_name
#   => "storages"

Now let’s say we we want a many-to-many relationship between these two models. Active Record provides two options: has_and_belongs_to_many and has_many :through. They both do exactly what you think they do, from the outside. The difference between the two has to do with join tables, intermediate models, and semantics.

  class Host < ApplicationRecord
    has_and_belongs_to_many :storages
  end

  class Storage < ApplicationRecord
    has_and_belongs_to_many :hosts
  end

  # host.storages gives me all the storages for that host, woo!
  # storage.hosts gives me all the hosts for that storage, woo!

In this situation, Rails automagically looks for a table called hosts_storages and creates an internal model on the fly to make the association work exactly how you’d expect it to. Rails knows to look for hosts_storages because it’s the plural form of both model names in lexical order.

To set this up, you just write a migration that adds a hosts_storages table using create_join_table(:hosts, :storages). No primary_key is added to this new table (because you don’t need it).
That’s it!

  • The has_many :through association creates a many-to-many relationship with an explicit model/table to be used as the join table.

    Let’s say that there’s another business object called Container. Hosts have access to storages through containers and vice-versa:

  class Host < ApplicationRecord
    has_many :storages, :through => :containers
  end

  class Container < ApplicationRecord
    belongs_to :host
    belongs_to :storage

    def human_readable_name
      "Container #{name}"
    end
  end

  class Storage < ApplicationRecord
    has_many :hosts, :through => :containers
  end

  # host.storages gives me all the storages for that host, woo!
  # storage.hosts gives me all the hosts for that storage, woo!
  # container.human_readable_name uses a 'name' column that I added in `containers`

In this situation, when you say host.storages Rails automagically looks for a model called Container that belongs to the Host (via containers.host_id), looks for a storage association which the belongs_to resolves to containers.storage_id, and lastly initializes a Storage with that id.


Q: “Awesome, Chris! So what’s the difference?”

A: The difference between these two associations is the importance of the join table.

  • In a has_and_belongs_to_many, there’s no explicit business model connecting the two tables. The join table is implicit because it has no Active Record model. It doesn’t have a model because it has no business concern. There are no attributes in this join table beyond just joining the Hosts and Storages.
  • In a has_many :through, Container is a first-class citizen, a model that might have other attributes that we care about in our application. In our example, that’s just this name column that we use in Container#human_readable_name. We might have lots of other features that Container models for us!

Makes sense? Great! Everything I’ve said up to this point is the way it works in the Ideal Rails World®, for completely normal Rails applications. Now let’s talk about how it works in ManageIQ!

In ManageIQ World™

In ManageIQ, we can use a has_many :through completely normally, as in the Ideal Rails World® above.

Great! However…

In ManageIQ, we cannot use a has_and_belongs_to_many association exactly as described in the Ideal Rails World® above.

Q: “…why not, Chris?”

A: Because Nick Carboni, that’s why.
A2: We cannot use a normal join table (used in a HABTM association) without an extra step because replication in ManageIQ requires primary keys on every single table, and Active Record, as stated earlier, normally just creates a primary-keyless join table.

Which, finally, brings us to the point of this little talk. There are three proposed ways of dealing with this issue in ManageIQ:

Option 1: Add a primary key to join tables when using HABTM

Primary keys are a database concern, so we can just manually add a primary key to the created table and be on our merry way.

In our migration, we do NOT use create_join_table* and instead use create_table as normal, which automatically creates a primary key:

class Host < ApplicationRecord
  has_and_belongs_to_many :storages
end

class Storage < ApplicationRecord
  has_and_belongs_to_many :hosts
end

# In the migration...

# NOT create_join_table
create_table :hosts_storages do |t|
  t.bigint :host_id
  t.bigint :storage_id
end

*Note that you probably could pass an option to add a primary key to create_join_table, but let’s keep this simple.

Key points:

  • No explicit Active Record model, as is normal in HABTM. Semantics of HABTM are upheld (It’s Just A Join Table™).
  • Table name is hosts_storages, as expected.

Option 2: ALWAYS use has_many :through, but use a conventionally named join table

If it’s an Active Record model, it has a primary key. You can always use a normal Active Record model as the join model/table for a many-to-many relationship.

In this case, we create a model ourselves with a normalish looking name, just combining the two models’ singular names. Easy to remember! We also specify a conventionally named join table.

class Host < ApplicationRecord
  has_many :storages, :through => :host_storages
end

class HostStorage < ApplicationRecord
  self.table_name = 'hosts_storages'

  belongs_to :host
  belongs_to :storage
end

class Storage < ApplicationRecord
  has_many :hosts, :through => :host_storages
end

# In the migration...

create_table :hosts_storages do |t|
  t.bigint :host_id
  t.bigint :storage_id
end

Key points:

  • Explicit Active Record model, even though It’s Just A Join Table™
  • You MUST specify self.table_name in the join model if you want to name it SingularSingular but still have the conventional plural_plural table name.

Option 3: ALWAYS use has_many :through, never use Just A Join Table™

This is the same as Option 2, except we go under the logic that “If it’s backed by a Active Record model, it’s NOT Just A Join Table™ so use the table name the model expects.”

class Host < ApplicationRecord
  has_many :storages, :through => :host_storages
end

class HostStorage < ApplicationRecord
  belongs_to :host
  belongs_to :storage
end

class Storage < ApplicationRecord
  has_many :hosts, :through => :host_storages
end

# In the migration...

create_table :host_storages do |t| # Note it's not 'hosts_storages'
  t.bigint :host_id
  t.bigint :storage_id
end

Key points:

  • Explicit Active Record model, even though It’s Just A Join Table™
  • host_storages is now the join table name, even though under normal circumstances I’d expect it to be hosts_storages by just looking at db tables.

Q: “So uh, who cares, Chris?”

A: People who are tired of the confusing 20 minute conversation we have on Gitter explaining to someone how to make a join table, that’s who.

I wrote this up not to try and make some pedantic point about which is more correct, but to seek community input regarding which method we should be consistently using.
Fun fact, 2/3 of these methods are found within ManageIQ today, and the 3rd has been tried before. That means whenever anyone looks for an example of how to make a join table
in our application they find multiple weird ways of doing it. This leads to a long Gitter conversation reexplaining 3 different ways of doing it (there’s actually yet another
method that I haven’t described here, can you guess it?). This is wasteful and confusing.

Voting time!

So let’s pick! Please reply to this posting with which option, 1-3, you’d like to see. If you don’t care, that’s great! Just don’t reply.

Q: “Ok smarty pants Chris, what do YOU think?”

A: I see Option 1 as the clear winner for me. There’s literally nothing different about this from normal Rails applications beyond making sure that the join table you use has primary keys for replication. That’s it. Everything is the same. Plus, less code! (no explicit AR model to solely join two other models) Again though, the main point for me is just that there’s a single convention that we decide on.


#2

So, a long while ago Matthew suggested we always do 2 or 3. One reason is that as soon as we need to “upgrade” a join table with some feature, then all the coding is in place. It is fitting you chose HostStorage because that is the specific join table that sparked the conversation. In that case we needed a “read_only” flag on the join table itself.


#3

So I think I’ve found your 4th option and that’s to name the class so that the join table looks like a join table.

So:

class Host < ApplicationRecord
  has_many :storages, :through => :hosts_storages
end

class HostsStorage < ApplicationRecord
  belongs_to :host
  belongs_to :storage
end

class Storage < ApplicationRecord
  has_many :hosts, :through => :hosts_storages
end

# In the migration...

create_table :hosts_storages do |t|
  t.bigint :host_id
  t.bigint :storage_id
end

This way your :through matches the table name.

That said, I generally prefer less magic so I’ll agree with @Fryguy :point_up:


#4

Well, #itDepends™

Option 1 makes sense when you have a join table that is strictly there to facilitate a many-to-many relationship between two models and nothing more. There is a lot of extra weight in “ApplicationRecord” (used on pretty much all of our models) compared to the “Ideal Rails World®”, and so I think that if there is nothing of value by adding a model that will never be used and add extra memory bloat to the application.

That said, if you are even thinking of using the middle table for anything other than a dumb linking table, then option 2-3 really makes the most sense. I would really see the above models as Host, Storage and StorageArray (versus HostStorage) where we would be seeing use of being able to query against the StorageArray directly to gain info about it for the application.


#5

I think you just summed up what I said :smiley:

I would really see the above models as Host, Storage and StorageArray (versus HostStorage) where we would be seeing use of being able to

That’s voting for Option 3 (never a join table, if it’s a model, it’s a first class citizen)


#6

Thanks for doing this @chrisarcand!

I think option 1 is ok for simple join tables but how often do you know when things will always be simple?

Option 3 seems the most future proof and less magic. I’m afraid if we don’t follow a table naming convention in option 2 as required in option 3, we might not be consistent. So, option 3 is less magic but requires us to follow a table naming convention.


#7

Excellent writeup @chrisarcand.
I vote for 3, because we end up with information on the join table in almost all cases.
Plus it requires people to think about naming things.


#8

Thanks for the writeup @chrisarcand!

I’m going to vote for option 3 with a slight change. In your example I would name the join model like a first class citizen. Something like HostStorageConnection and its table would therefore be host_storage_connections.

  • It has a primary key for replication
  • We can standardize on has_many :through
  • It allows for other metadata (i.e. read-only)
  • There is no confusion on the tablename
  • Adding logic in the future is easy

#9

I’m voting for Option 3 with table name as host_storages.

Other than reasons suggested by other folks, it seems to me Rails agrees enough that it needs no further hints (aka self.tablename) to figure out the tablename.

There is a view that sees this convention not as singular-plural, but rather as the plural form of the JoinedEntity, i.e. (host_storage)s


#10

@chrisarcand I might be reading it wrong (or perhaps understand rails wrong?), but I think the examples are missing some pieces that might sway people’s opinions. For example, in number 2, you can’t just do a has_many :storages :through => :hosts_storages without also defining the :hosts_storages . Defining the :hosts_storages association is really the nasty part, particular if Rails can’t deterine the model name from the association (which I think would happen in number 3). That ugliness factor is being left out. Can you update the OP with this?

example:

class Host < ApplicationRecord
  has_many :host_storages
  has_many :storages, :through => :host_storages
end

#11

After re-reading, I think @NickLaMuro and I are saying the same thing (just suggesting different names for the model).


#12

My vote goes for number 3, so it looks like:

class Host < ApplicationRecord
  has_many :host_storages
  has_many :storages, :through => :host_storages
end

FYI: there was a bug in HABTM while ago, which I didn’t find after like 1.5 days of trying, so I rather moved to using has_many :through that worked, which was the plan anyway at that time. HABTM had issues, when there was a complex join of like 8 tables. Not sure if anyone have fixed it. :slight_smile:


#13

Seems like number 3 is the choice of most people, so let’s run with that.

So, that means,

  • Table name is singular_plural (e.g. host_storages)
  • Model name is SingularSingular (e.g. HostStorage)

The model name is a first-pass name because we don’t know what actual thing it might entail in the future, and it can be changed later. As per @NickLaMuro and @bdunne suggestions, the model may actually be changed later to encapsulate a concept, for example StorageArray, and the table name can possibly changed accordingly at that time.