Hi! Let’s talk about join tables
In the Ideal Rails World®
Let’s say we have a Host
model and a Storage
model. In Rails, Active Record looks for conventionally named table names based on the names of these models. The table names would be hosts
and storages
, respectively:
class Host < ApplicationRecord
end
class Storage < ApplicationRecord
end
# Host.table_name
# => "hosts"
# Storage.table_name
# => "storages"
Now let’s say we we want a many-to-many relationship between these two models. Active Record provides two options: has_and_belongs_to_many
and has_many :through
. They both do exactly what you think they do, from the outside. The difference between the two has to do with join tables, intermediate models, and semantics.
- The has_and_belongs_to_many association creates a many-to-many relationship through an implicit join table.
class Host < ApplicationRecord
has_and_belongs_to_many :storages
end
class Storage < ApplicationRecord
has_and_belongs_to_many :hosts
end
# host.storages gives me all the storages for that host, woo!
# storage.hosts gives me all the hosts for that storage, woo!
In this situation, Rails automagically looks for a table called hosts_storages
and creates an internal model on the fly to make the association work exactly how you’d expect it to. Rails knows to look for hosts_storages
because it’s the plural form of both model names in lexical order.
To set this up, you just write a migration that adds a hosts_storages
table using create_join_table(:hosts, :storages)
. No primary_key is added to this new table (because you don’t need it).
That’s it!
-
The has_many :through association creates a many-to-many relationship with an explicit model/table to be used as the join table.
Let’s say that there’s another business object called
Container
. Hosts have access to storages through containers and vice-versa:
class Host < ApplicationRecord
has_many :storages, :through => :containers
end
class Container < ApplicationRecord
belongs_to :host
belongs_to :storage
def human_readable_name
"Container #{name}"
end
end
class Storage < ApplicationRecord
has_many :hosts, :through => :containers
end
# host.storages gives me all the storages for that host, woo!
# storage.hosts gives me all the hosts for that storage, woo!
# container.human_readable_name uses a 'name' column that I added in `containers`
In this situation, when you say host.storages
Rails automagically looks for a model called Container
that belongs to the Host
(via containers.host_id
), looks for a storage
association which the belongs_to
resolves to containers.storage_id
, and lastly initializes a Storage
with that id.
Q: “Awesome, Chris! So what’s the difference?”
A: The difference between these two associations is the importance of the join table.
- In a has_and_belongs_to_many, there’s no explicit business model connecting the two tables. The join table is implicit because it has no Active Record model. It doesn’t have a model because it has no business concern. There are no attributes in this join table beyond just joining the
Hosts
andStorages
. - In a has_many :through,
Container
is a first-class citizen, a model that might have other attributes that we care about in our application. In our example, that’s just thisname
column that we use inContainer#human_readable_name
. We might have lots of other features thatContainer
models for us!
Makes sense? Great! Everything I’ve said up to this point is the way it works in the Ideal Rails World®, for completely normal Rails applications. Now let’s talk about how it works in ManageIQ!
In ManageIQ World™
In ManageIQ, we can use a has_many :through completely normally, as in the Ideal Rails World® above.
Great! However…
In ManageIQ, we cannot use a has_and_belongs_to_many association exactly as described in the Ideal Rails World® above.
Q: “…why not, Chris?”
A: Because Nick Carboni, that’s why.
A2: We cannot use a normal join table (used in a HABTM association) without an extra step because replication in ManageIQ requires primary keys on every single table, and Active Record, as stated earlier, normally just creates a primary-keyless join table.
Which, finally, brings us to the point of this little talk. There are three proposed ways of dealing with this issue in ManageIQ:
Option 1: Add a primary key to join tables when using HABTM
Primary keys are a database concern, so we can just manually add a primary key to the created table and be on our merry way.
In our migration, we do NOT use create_join_table
* and instead use create_table
as normal, which automatically creates a primary key:
class Host < ApplicationRecord
has_and_belongs_to_many :storages
end
class Storage < ApplicationRecord
has_and_belongs_to_many :hosts
end
# In the migration...
# NOT create_join_table
create_table :hosts_storages do |t|
t.bigint :host_id
t.bigint :storage_id
end
*Note that you probably could pass an option to add a primary key to create_join_table
, but let’s keep this simple.
Key points:
- No explicit Active Record model, as is normal in HABTM. Semantics of HABTM are upheld (It’s Just A Join Table™).
- Table name is
hosts_storages
, as expected.
Option 2: ALWAYS use has_many :through, but use a conventionally named join table
If it’s an Active Record model, it has a primary key. You can always use a normal Active Record model as the join model/table for a many-to-many relationship.
In this case, we create a model ourselves with a normalish looking name, just combining the two models’ singular names. Easy to remember! We also specify a conventionally named join table.
class Host < ApplicationRecord
has_many :storages, :through => :host_storages
end
class HostStorage < ApplicationRecord
self.table_name = 'hosts_storages'
belongs_to :host
belongs_to :storage
end
class Storage < ApplicationRecord
has_many :hosts, :through => :host_storages
end
# In the migration...
create_table :hosts_storages do |t|
t.bigint :host_id
t.bigint :storage_id
end
Key points:
- Explicit Active Record model, even though It’s Just A Join Table™
- You MUST specify
self.table_name
in the join model if you want to name it SingularSingular but still have the conventionalplural_plural
table name.
Option 3: ALWAYS use has_many :through, never use Just A Join Table™
This is the same as Option 2, except we go under the logic that “If it’s backed by a Active Record model, it’s NOT Just A Join Table™ so use the table name the model expects.”
class Host < ApplicationRecord
has_many :storages, :through => :host_storages
end
class HostStorage < ApplicationRecord
belongs_to :host
belongs_to :storage
end
class Storage < ApplicationRecord
has_many :hosts, :through => :host_storages
end
# In the migration...
create_table :host_storages do |t| # Note it's not 'hosts_storages'
t.bigint :host_id
t.bigint :storage_id
end
Key points:
- Explicit Active Record model, even though It’s Just A Join Table™
-
host_storages
is now the join table name, even though under normal circumstances I’d expect it to behosts_storages
by just looking at db tables.
Q: “So uh, who cares, Chris?”
A: People who are tired of the confusing 20 minute conversation we have on Gitter explaining to someone how to make a join table, that’s who.
I wrote this up not to try and make some pedantic point about which is more correct, but to seek community input regarding which method we should be consistently using.
Fun fact, 2/3 of these methods are found within ManageIQ today, and the 3rd has been tried before. That means whenever anyone looks for an example of how to make a join table
in our application they find multiple weird ways of doing it. This leads to a long Gitter conversation reexplaining 3 different ways of doing it (there’s actually yet another
method that I haven’t described here, can you guess it?). This is wasteful and confusing.
Voting time!
So let’s pick! Please reply to this posting with which option, 1-3, you’d like to see. If you don’t care, that’s great! Just don’t reply.
Q: “Ok smarty pants Chris, what do YOU think?”
A: I see Option 1 as the clear winner for me. There’s literally nothing different about this from normal Rails applications beyond making sure that the join table you use has primary keys for replication. That’s it. Everything is the same. Plus, less code! (no explicit AR model to solely join two other models) Again though, the main point for me is just that there’s a single convention that we decide on.