Postgres HA and split brain


#1

Hi,

probably a question for @carbonin. I just watched the Sprint 46 review and noticed the Postgres HA feature. Very nice!!

I was wondering how it deals with network partitions. Say that a network partition occurs where some appliances and the master get separated from other appliances and the slave. How does it prevent the slave being promoted to master while the original master may still be accepting writes?

Thanks,
Geert


#2

So the logic for when/where to fail the database over is handled by repmgrd. You can take a look at the documentation here.

I think what you’re looking for is the section on “witness servers” which allow a particular network segment to determine whether they have the “voting majority” and should promote a local server.

This has a few implications for our implementation.

  1. We don’t yet have an interface for creating a witness server (but this is planned)
  2. The logic to determine which database the app points to is dependent on losing connection to the original master.

This means that if we have application servers on both of the network segments, we need some more logic to determine what to do in that scenario. This has not yet been addressed because we don’t have a very satisfying answer to that question. For now I would suggest keeping the application servers on a single network segment if at all possible so that they will always have a consensus on what database they are pointing to. When we have a method for servers to communicate without using the database, this problem will be much easier to tackle.

The second thing that you may be interested in is STONITH. We haven’t tackled this yet because how to take down a “misbehaving” node is very infrastructure specific and difficult to generalize for all use cases. What I would suggest is injecting a script that runs as the promote_command configured for repmgrd that will handle STONITH as well as promoting the local node.

TLDR;
These are really difficult problems to solve, and we have to solve them in a way that can be generalized to many different virtualization and networking scenarios. But, we’re working on it :slight_smile:


#3

Thanks @carbonin.

These are really difficult problems to solve, and we have to solve them in a way that can be generalized to many different virtualization and networking scenarios. But, we’re working on it

Yup, it’s definitely not trivial. I had a look at the repmgr documentation. It looks like currently there is no protection against split brain. That’s probably something worth documenting as it could lead to loss of data.

I had a look at the witness stuff, but I wasn’t able to figure out how it protects against split brain given that the database doesn’t check if it has quorum before accepting writes (or does it?)

STONITH/fencing would definitely deal with split brain. But in case of a network partition it may not work unless you’ve got an independent network connection. It’s the classical CAP tradeoff where the customer has to choose between availability or consistency.


#4

This section actually explains that repmgrd does need a “voting majority” in order to promote a master. So using a witness server would solve that, but agreed, STONITH is really the only sure way to go if you can configure it.