High availability Setup


#1

Trying to deploy a HA setup of ManageIQ Hammer. I am able to get the databases setup in Primary/standby by following the instructions here:

However after i reboot my primary DB Server, by standby server never switches from standby to primary.

Is there something i am missing?


#2

It’s possible that the primary database restarts faster than the threshold for failover.

Try stopping the postgres service on the primary to trigger a failover. You can also check the /var/log/repmgrd directory for logs related to the failover on the standby.


#3

@bsockel

Instead of OS reboot, I just use following commands

su - postgres -c “repmgr cluster show” # as root user on one of two db nodes.
systemctl stop rh-postgresql95-postgresql # On node with Primary in above output


#4

Thanks for the suggestions. After further testing these are a couple of things that i have noticed:

  • After configuring primary DB for replication, repmgr service does not start automatically, I manually have to start it. Enabled and service starts normally on secondary server. Is this normal?

  • After stopping and restarting primary server repmgr shows this on primary db

    [root@mylab-db1 ~]# su - postgres -c “repmgr cluster show”
    ID | Name | Role | Status | Upstream | Location | Connection string
    ----±--------------±--------±---------------------±--------------±---------±----------------------------------------------------
    10 | 10.20.101.176 | primary | * running | | default | host=10.20.101.176 user=root dbname=vmdb_production
    20 | 10.20.101.177 | standby | ! running as primary | 10.20.101.176 | default | host=10.20.101.177 user=root dbname=vmdb_production

    On secondary it shows:
    ID | Name | Role | Status | Upstream | Location | Connection string
    ----±--------------±--------±----------±---------±---------±----------------------------------------------------
    10 | 10.20.101.176 | primary | ! running | | default | host=10.20.101.176 user=root dbname=vmdb_production
    20 | 10.20.101.177 | primary | * running | | default | host=10.20.101.177 user=root dbname=vmdb_production

    WARNING: following issues were detected
    - node “10.20.101.176” (ID: 10) is running but the repmgr node record is inactive

What is the correct process for failing back over to the primary?

How can i check the application failover Monitor config. Always seems to show the primary server on the summary screen. When ever i fail the primary DB server by stopping the sql service, i get an error page when attempting to access ManageIQ

Thanks