How do I debug replication down issue from regional DBs?


#1

Hi
Following screenshot is from master region 99, I see it is “down” in status column

Actions -->Validate is showing " Subscription Credentials validated successfully"

Question is how can I debug this issue in command line ?
Where is the log file I can see Global/remote traffics ?


#2

@tjyang All of the logging for replication should be in the postgres logs. So $APPLIANCE_PG_DATA/pg_log/* on your database servers (global and remote regions).

The issue will likely be somewhere quite a bit in the past if you’re at 15GB of backlog. If you can’t find anything it may be best to remove and re-add the subscription.

If you installed the latest version of postgresql (I think anything past 9.5.14) you may need to reinstall the pglogical package.


#3

I deleted all the remote regions and only enabled region 2 on master.

But the status went from “initializing” to “down” status.

Here is the logs related to region 2 in /var/opt/rh/rh-postgresql95/lib/pgsql/data/pg_log

x EDT:xx@:[29598]:LOG:  starting apply for subscription region_2_subscription
x EDT:xx@:[29598]:ERROR:  no data left in message
x EDT:xx@:[29598]:LOG:  apply worker [29598] at slot 1 generation 40523 exiting with error
x EDT:xx:[1237]:LOG:  worker process: pglogical apply 16386:286034136 (PID 29598) exited with exit code 1
x EDT:xx@:[29654]:LOG:  starting apply for subscription region_2_subscription
x EDT:xx@:[29654]:ERROR:  no data left in message
x EDT:xx@:[29654]:LOG:  apply worker [29654] at slot 1 generation 40524 exiting with error
x EDT:xx:[1237]:LOG:  worker process: pglogical apply 16386:286034136 (PID 29654) exited with exit code 1

#4

With @carbonin’s pointer and after some googling.
I am able to have 3 regions connected successfully.

image


#5

Can you share some useful hints, please.