Postgres read replicas are commonly used not only to distribute query load amongst multiple nodes, but also to ensure high availability (HA) of the database. If the primary node of a Postgres cluster fails, a read replica can be promoted to be the new primary, processing write (and read) requests from thereon.
Prior to Postgres version 16, read replicas (or stand-by servers) couldn’t be used at all for logical replication. Logical replication is a method for replicating data from a Postgres publisher to subscribers. These subscribers can be other Postgres instances, as well as non-Postgres tools, such as Debezium, which use logical replication for change data capture (CDC). Logical replication slots—which keep track of how far a specific subscriber has consumed the database’s change event stream—could only be created on the primary node of a Postgres cluster. This meant that after a failover from primary to replica you’d have to create a new replication slot and typically also start with a new initial snapshot of the data. Otherwise you might have missed change events occurring after reading from the slot on the old primary and before creating the slot on the new primary.
Whilst external tools such as pg_failover_slots were added over time, a built-in solution for failing over replication slots from one Postgres node to another was sorely missed by many users. This situation substantially improved with the release of Postgres 16, which brought support for setting up replication slots on read replicas. I’ve discussed this feature in great detail in this post. Back then, I also explored how to use replication slots on replicas for manually implementing replication slot failover. But the good news is, as of Postgres version 17, all this is not needed any longer, as it finally supports failover slots out of the box!
Failover slots are replication slots which are created on the primary node and which are propagated automatically to read replicas. Their state on the replica is kept in sync with the upstream slot on the primary, which means that after a failover, when promoting a replica to primary, you can continue to consume the slot on the new primary without the risk of missing any change events. This is really great news for using tools like Debezium (and by extension, data platforms such as Decodable, which provides a fully-managed Postgres CDC connector based on Debezium) in HA scenarios, so let’s take a closer look at how this works.
Hello, Failover Slots!
Let’s start by exploring failover slots solely from the perspective of using Postgres’ SQL interface to logical replication. If you want to follow along, check out this project from the Decodable examples repository. It contains a Docker Compose file which brings up a Postgres primary and a read replica (based on this set-up by Peter Eremeykin, which makes it really easy to spin up an ephemeral Postgres cluster for testing purposes), as well as a few other components which we’ll use later on. Start everything by running:
Primary and replica are synchronized via a physical replication slot (i.e. unlike with logical replication, all the WAL segments are replicated), ensuring that all data changes done on the primary are replicated to the replica immediately. Get a session on the primary (I’m going to use pgcli which is my favorite Postgres CLI client, but psql will do the trick as well):
The prompt has been adjusted to show the current instance. To verify you are on the primary and not the replica run the following:
Both primary and replica are already configured with WAL level <span class="inline-code">logical</span>, as required for logical replication:
Note that if you are running Postgres on Amazon RDS (where version 17 is available in the preview environment right now), you’ll need to set the parameter <span class="inline-code">rds.logical_replication</span> to <span class="inline-code">on</span> for this.
There’s a physical replication slot for synchronizing changes from the primary to the replica server:
This slot must be added to the set of synchronized stand-by slots using the <span class="inline-code">synchronized_standby_slots</span> configuration option:
This setting, new in Postgres 17, makes sure that any logical replication slots we are going to set up in the following cannot advance beyond the confirmed log sequence number (LSN) of this physical slot. Without that setting, logical replication consumers such as Debezium may receive changes which never got propagated to the replica in case of a failure of the primary, resulting in an inconsistent state.
Next, get a database session on the replica:
As above, verify that you are on the right node indeed:
It is already configured with <span class="inline-code">hot_standby_feedback=ON</span> which is a requirement for failover slots to work. In addition, the database name must be added to <span class="inline-code">primary_conninfo</span> on the replica (its connection string for connecting to the primary). I couldn’t find a way for just adding a single attribute, so I retrieved the current value and added the database name, so that the complete string can be written back (on RDS, that setting can’t be changed, instead set the <span class="inline-code">rds.logical_slot_sync_dbname</span> parameter to the name of the database):
At this point, primary and replica are set up for failover slots to work. So let’s create a logical replication slot on the primary next:
As of Postgres 17, there’s a new optional parameter of the <span class="inline-code">pg_create_logical_replication_slot()</span> function for specifying that a failover slot should be created (<span class="inline-code">failover=true</span>). On the replica, call <span class="inline-code">pg_sync_replication_slots()</span> for synchronizing all failover slots from the primary:
This will make sure that the slots on both primary and replica are at exactly the same LSN. To verify that this is the case, query the status of the slot on both nodes, using the same query as above:
Right now, the <span class="inline-code">confirmed_flush_lsn</span> of the slot on the replica matches that of the slot on the primary, i.e. the two slots are in sync. But once a client consumes changes from the primary slot, the slot on the replica will not be updated accordingly. You could manually call <span class="inline-code">pg_sync_replication_slots()</span> repeatedly to synchronize the slot on the replica, but luckily, there’s an easier way. By setting <span class="inline-code">sync_replication_slots</span> to <span class="inline-code">on</span> on the replica, a synchronization worker will be started, which will propagate the replication state automatically:
Now do some data changes in the primary and consume them from the replication slot:
If you query <span class="inline-code">pg_replication_slots</span> once more, you'll see that the confirmed flush LSN of the slot on the replica matches still that of the primary. To conclude our basic experiments, let’s see what happens when we try to consume the replication slot on the replica server:
Somewhat to be expected, this triggers an error: as the state of failover slots on a replica is driven by the corresponding slot on the primary, they cannot be consumed. Only after promoting a replica to the primary, clients can connect to that slot and read change events from it. Let’s give this a try by setting up an instance of the Decodable Postgres CDC connector next!
Failover Slots in Decodable
Decodable is a fully managed realtime data platform based on Apache Flink. It offers managed connectors for a wide range of source and sink systems, allowing you to build robust and efficient ETL pipelines with ease. Its Postgres CDC connector is using Debezium under the hood, which in turn uses logical replication for ingesting data change events from Postgres. Thanks to failover slots, it’s possible to continue streaming changes from a Postgres read replica which got promoted to primary into Decodable, without missing any events.
For this kind of use case it makes sense to put a Postgres proxy in front of the database cluster, exposing one stable endpoint for it. This will enable us later on to fail over from primary to replica without having to reconfigure the Decodable Postgres connector. In a production scenario, this approach will make the fail-over an implementation detail managed by the team owning the database, not requiring coordination with the team running the data platform. The Docker Compose set-up from above contains the pgbouncer proxy for this purpose.
In order to access the database on your machine from Decodable which is running in the cloud, we are going to use ngrok, an API gateway service. Amongst other things, ngrok lets you expose non-public resources to the cloud, i.e. exactly what we need for this example. You may consider this approach also for connecting to an on-prem database in a production use case.
The overall architecture looks like this:
For the following steps, you’ll need to have these things:
- A free Decodable account
- The Decodable CLI installed on your machine
- A free ngrok account
The pgbouncer proxy is configured to connect to the primary Postgres host, and ngrok exposes a publicly accessible tunnel for connecting to pgbouncer.
Decodable provides support for declaring your resources—connectors, SQL pipelines, etc.—in a declarative way. You describe the resources you’d like to have in a YAML file, and the Decodable platform will take care of materializing them in your account. The example project contains definitions for a Postgres CDC source connector, a Decodable secret with the database password to be used by the connector, and a stream, to which the connector will write its data:
ngrok creates a tunnel for publicly exposing the local Postgres instance using a random host name and port. With the help of httpie and jq, we can retrieve the public endpoint of the tunnel from the local ngrok REST API (alternatively, there’s also a UI running on http://localhost:4040/), allowing us to propagate these values to the resource definition via sed before applying the same using the Decodable CLI ( the special file name - indicates to read from stdin rather from a given file):
At this point, all the resources have been created in your Decodable account, but the Postgres source connector is not running yet. Before we can start it (“activate” in Decodable terminology), we need to create the replication slot in the database, configuring it as a failover slot. For the Decodable Postgres CDC connector to pick up the slot, it must have the name <span class="inline-code">decodable_<connection-id></span>. To obtain the connection id, refer to the output of the <span class="inline-code">decodable apply</span> command above, or retrieve it using <span class="inline-code">decodable query</span> like so:
In the primary Postgres instance, create the replication slot, configuring it as a failover slot:
Next, activate the connector:
Upon its first activation, the connector will create an initial snapshot of the data in the customers table and then read any subsequent data changes incrementally via the replication slot we’ve created. To take a look at the data head over to the Decodable web UI in your browser and go to the “Streams” view. Select the <span class="inline-code">inventorydb__inventory__customers</span> stream and go to the “Preview” tab where you should see the data from the snapshot. Do a few data changes in the Postgres session on the primary node:
Shortly thereafter, these updates will be reflected in the stream preview as well:
Now let’s simulate a failover from primary to replica server and see how the failover of the replication slot is handled. Stop the primary Postgres node:
Go to the database session on the replica and promote it to the primary:
At this point, pgbouncer still points to the previous, now defunct primary server. Change its configuration in the Docker Compose file so it forwards to the new primary:
Stop and restart pgbouncer (other proxies such as pgcat are also be able to reload updated configuration on the fly):
If you go back to the Decodable web UI and take a look at the <span class="inline-code">inventorydb_source</span> connection, it should be in the “Retrying” state at this point, as it lost the connection to the previous primary (via the proxy). After a little while, it will be in “Running” state again, as the proxy now routes to the new primary server. The connector now consumes from the replication slot on that new primary server, as you can confirm by doing a few more data changes on that node and examining the data in the Decodable stream preview also by verifying that the replication slot now is in Active state:
If you retrieve the state of the replication state again, using the same query as above, you’ll also see that it is marked as <span class="inline-code">active</span> now:
Wrapping Up
Failover slots are an essential part to using Postgres and logical replication in HA scenarios. Added in Postgres 17, they allow logical replication clients such as Debezium to seamlessly continue streaming change events after a database fail-over, ensuring no events are lost in the process. This renders previous solutions such as manually synchronizing slots on a standby server (as supported since Postgres 16) or external tools such as pg_failover_slots obsolete (although the latter still comes in handy if you are on an older Postgres version and can’t upgrade to 17 just yet).
To create a failover slot, the new failover parameter must be set to true when calling <span class="inline-code">pg_create_logical_replication_slot()</span>. Debezium does not do this yet at the moment, but I am planning to implement the required change in the next few weeks. Keep an eye on DBZ-8412 to track the progress there. In the meantime, you can create the replication slot manually, as described above. To learn more about Postgres failover slots, check out the blog posts by Bertrand Drouvot and Amit Kapila. RDS users should refer to the AWS documentation on managing failover slots.
If you’d like to give it a try yourself, including the managed Postgres CDC connector on Decodable, you can find the complete source code for this blog post in the Decodable examples repository on GitHub.