How to Setup Physical Replication With PostgreSQL

“PostgreSQL is a popular open-source relational database management system that is supported by the most operating system. The way of keeping the copy of the main database is called replication. The copy of the main database is replicated on the different physical servers. The replication is done by setting the master-slave configuration. Here, the master acts as the primary instance and stores the main copy of the database, and the slave acts as the secondary instance that stores the identical copy of the main database. The master works as the read-write server, and the slave works as the read-only server. Two types of replications are supported by the PostgreSQL database that helps to implement scalable, high availability, fault-tolerant applications. These are physical streaming replication and logical replication. The way to set up physical or streaming replication with the PostgreSQL database has been explained in this tutorial.”

Physical or Streaming Replication

The most commonly used replication solution is Write-Ahead Log (WAL) Shipping or streaming replication. The standby database server or replication slave is configured to make the connection with the primary/master server. The WAL records are streamed to the standby server before filling the WAL file. The data is transmitted to the standby server in an asynchronous mode by default. That means the data is transmitted to the standby server after committing the transaction in the primary server. This may cause data loss because if the master server crashes without committing any transaction, then that transaction will not be replicated to the standby server.

The Architecture of Physical Replication

A background process that is called WAL sender starts on the master machine after setting up the configurations of physical replication. It accepts a request from the standby and streams the WAL records to the standby continuously. Another background process that is called the WAL receiver starts on the slave machine that receives and applies the changes of the main database to the database of the slave machine. The physical replication can be defined by the following diagram.

Physical Replication Features

The entire data of a single cluster is copied by the physical replication process, where the cluster is the set of databases managed by the PostgreSQL database. The source machine is called the primary server, and the destination machine is called the standby server. Some important features of physical replication have been mentioned below.

WAL Files

The ordered series of WAL records are generated by the PostgreSQL server. These records will be transported to another machine by using physical replication, and the modification will be done in the local database. The WAL records are divided into equally sized files that are called WAL segments. These files are created on the pg_wal directory, and old WAL files are removed when they are not needed anymore.

Hot, Cold, and warm standby

The hot standby server is used to store the up-to-date data and allow the clients to execute the read-only transaction. The cold standby server is not up and running normally, and it starts when a failure occurs. The warm standby server works similar to the hot standby server, except that it can’t make the connection with the client.

Recovery Mode

Warm and hot standby servers run in recovery mode, and Postgres will import and apply WAL files generated by a primary server in the recovery mode.

Advantages of Physical or Streaming Replication

It can be used to keep the backup of the database and helps to retrieve the data when the primary server crashes.
It is better to use read-only operations.
It guarantees that a copy of the last operation of the primary server will be saved.
One or more standby servers can be connected with the primary server, and the log information streams from the primary server to all connected standby servers. If one of the standby servers is disconnected or delayed for any reason, then the streaming will be continued for another standby server.
If the primary server is disconnected or shut down for any reason, then it will wait to send the updated data to the standby server before powering off.

Disadvantages of Physical or Streaming Replication

It replicates data asynchronously by default. That means WAL files that are not copied to the standby server may be removed when any changes are done in the primary server. It is better to set a higher value into the wal_keep_segments to prevent data loss.
It does not support replication between different PostgreSQL servers.
The data is not more secure in this replication because the sensitive data can be extracted without user authentication.
It does not truncate the command.
It does not support foreign tables, view, and partition root tables.
It does not support large object replication.
It does not support database schema and DDL commands.

Physical Replication Setup

The steps of implementing logical replication in the PostgreSQL database have been shown in this part of this tutorial.

Pre-requisites

Setup the master and replica nodes

You can set the master and the replica nodes in two ways. One way is to use two separate computers where Ubuntu operating system is installed, and another way is to use two virtual machines that are installed on the same computer. The testing process of the physical replication process will be easier if you use two separate computers for the master node and replica node because a specific IP address can be assigned easily for each computer. But if you use two virtual machines on the same computer, then the static IP address will require to be set for each virtual machine and make sure both virtual machines can communicate with each other through the static IP address. I have used two virtual machines to test the physical replication process in this tutorial. The hostname of the master node has been set to fahmida-master, and the hostname of the replica node has been set to fahmida-slave here.

Install PostgreSQL on both master and replica nodes

You have to install the latest version of the PostgreSQL database server on two machines before starting the steps of this tutorial. PostgreSQL version 14 has been used in this tutorial. Run the following commands to check the installed version of the PostgreSQL in the master node.

Run the following command to become a root user.

$ sudo -i

Run the following commands to log in as Postgres users with superuser privileges and make a connection with the PostgreSQL database.

$ su - postgres

$ psql

The output shows that PostgreSQL version 14.4 has been installed on Ubuntu version 22.04.1.

You have to install the same PostgreSQL version in the replica node because the logical replication can’t be set up between different versions of the PostgreSQL server.

Primary Node Configuration

The necessary configurations for the primary node have been shown in this part of the tutorial. The following tasks will be completed in the primary node.

Modify the postgresql.conf file to set up the IP address and WAL level.
Create a role user with that password to make a connection with the primary node from the replica node.
Modify pg_hba.conf file to add network connection information

Modify the postgresql.conf file

You have to set up the IP address of the primary node in the PostgreSQL configuration file named postgresql.conf that is located on the location, /etc/postgresql/14/main/postgresql.conf. Log in as the root user in the primary node and run the following command to edit the file.

$ nano /etc/postgresql/14/main/postgresql.conf

Find out the listen_addresses variable in the file, remove the hash (#) from the beginning of the variable to uncomment the line. You can set an asterisk (*) or the IP address of the primary node for this variable. If you set asterisk (*), then the primary server will listen to all IP addresses. It will listen to the specific IP address if the IP address of the primary server is set to this variable. In this tutorial, the IP address of the primary server that has been set to this variable is 192.168.10.5.

listen_addressess = “<IP address of your primary server>”

Next, find out the wal_level variable to set the replication type. Here, the value of the variable will be a replica.

wal_level = replica

Run the following command to restart the PostgreSQL server after modifying the postgresql.conf file.

$ systemctl restart postgresql

***Note: After setting up the configuration, if you face a problem starting the PostgreSQL server, then run the following commands for the PostgreSQL version 14.

$ sudo chmod 700 -R /var/lib/postgresql/14/main

$ sudo -i -u postgres

# /usr/lib/postgresql/14/bin/pg_ctl restart -D /var/lib/postgresql/14/main

You will be able to connect with the PostgreSQL server after executing the above command successfully.

Create role/user for replication

A role/user with specific permission is required for the replication. Run the following SQL command to create a role with the user for the replication.

# CREATE ROLE replicauser WITH REPLICATION LOGIN PASSWORD '12345';

The following output will appear if the role is created successfully.

Modify the pg_hba.conf file

You have to set up the IP address of the secondary node in the PostgreSQL configuration file named pg_hba.conf that is located on the location, /etc/postgresql/14/main/pg_hba.conf. Log in as the root user in the primary node and run the following command to edit the file.

$ nano /etc/postgresql/14/main/pg_hba.conf

Add the following information at the end of this file.

host <database name> <username> <IP address of the slave server>/32 scram-sha-256

The IP of the slave server is set to “192.168.10.10” here. According to the previous steps, the following line has been added to the file.

host replication replicauser 192.168.10.10/32 scram-sha-256

Restart the PostgreSQL server

Run the following command to restart the PostgreSQL server as a root user.

$ systemctl restart postgresql

Replica Node Configuration

The necessary tasks will be done for the replica node where the copy of the main database will be stored. The existing database content of the replica node will be removed to keep back up the database of the primary node and make the PostgreSQL server of the replica node read-only.

Stop the PostgreSQL server on the replica node

Run the following command after logging in as a root user to stop the PostgreSQL server.

$ systemctl stop postgresql

Remove the existing content of the replica node

Run the following command to remove existing database content from the PostgreSQL server of the replica node. It is necessary for replication purposes. The replica node will be worked in a read-only mode after executing the following command

$ rm -rf /var/lib/postgresql/14/main/*

Testing the Physical Replication Process

You have to create a database with one or more tables in the primary node to test whether the physical replication is working properly or not. Here, the existing database named postgres of the primary server has been used for testing purposes. If you want, you can create the table by creating a new database. The following tasks will be done in this part of the tutorial.

Add a new table in the selected database of the primary node.
Add a record to the table.
Store the copy of the database of the primary node to the replica node.
Create a table in the primary server

Login to the PostgreSQL database of the primary node and run the following SQL statement to create an employees table in the existing database postgres. If you want, you can create a new database and create the table by selecting the new database. I didn’t create any new database here. So, the table will be created on the default database. The employees’ table will contain 5 fields. These are id, name, address, email, and phone.

# CREATE TABLE employees (

id serial PRIMARY KEY,

name VARCHAR(15) NOT NULL,

address TEXT NOT NULL,

email VARCHAR(30),

phone CHAR(14) NOT NULL );

Run the following SQL statement to insert a record into the table.

# INSERT INTO employees (name, address, email, phone)

VALUES ('Farheen Hasan', '12/A, Dhanmondi, Dhaka.', '[email protected]', '+8801826783423');

Run the following SQL statement to read the content of the employees' table.

# Select * from employees;

If the table is created and a record is inserted in the table successfully, then the following output will appear after executing the select query. One record was inserted into the table that is shown in the output.

Copy the database to a replica server

Login to the PostgreSQL database of the replica server and run the following SQL statement to create a copy of the postgres database of the primary server into the replica server. After executing the statement, it will ask for the password of the user that was created in the previous step. If the authentication is done successfully, then the replication will be started.

# pg_basebackup -R -h 192.168.10.5 -U replicauser -D /var/lib/postgresql/14/main –P

The following output shows that the replication is done successfully, and 34962 KB has been copied.

Run the following command after completing the replication as the root user to restart the PostgreSQL server in the replica node.

$ systemctl restart postgresql

Now, login to the PostgreSQL server on the replica node and run the following SQL statement to check whether the employees’ table has been copied to the replica node or not.

# select * from employees;

The following output shows the content of the employees’ table of the replica node is the same as the content of the employees’ table of the primary node.

You can add one or more records or update records or delete records in the employees’ table of the primary node or add one or more tables in the selected database of the primary node and check the database of the replica node to verify that the updated content of the primary database is replicated properly in the database of the replica node or not.

Insert new records in the primary node:

Now, run the following INSERT command to add three more records in the employees’ table of the postgres database that is located in the primary node.

# INSERT INTO employees (name, address, email, phone)

VALUES ('Abir Hossain', '10, Jigatola, Dhaka.', '[email protected]', '+8801888564345'),

('Nila Chowdhury', '67/B, Mirpur, Dhaka.', '[email protected]', '+8801799453123'),

('Mizanur Rahman', '8/C, Malibag, Dhaka.', '[email protected]', '+8801957864564');

Run the following command to check the current content of the employees' table in the primary node.

# Select * from employees;

The following output shows that three new records have been inserted properly into the table.

Check the replica node after insertion:

Now, you have to check whether the employees table of the replica node has been updated or not. Login to the PostgreSQL server of the replica node and run the following command to check the content of the employees’ table.

# Select * from employees;

The following output shows that three new records have been inserted in the employees’ table of the replica node that was inserted in the primary node of the employees’ table. So, the changes in the main database have been replicated properly in the replica node.

Update record in the primary node:

Run the following UPDATE command that will update the value of the phone field where the value of the name field is “Nila Chowdhury”. There is only one record in the employees’ table that matches the condition of the UPDATE query.

# UPDATE employees SET phone = “+8801911111111” WHERE name = “Nila Chowdhury”;

Run the following command to check the current content of the employees’ table in the primary node.

# Select * from employees;

The following output shows that the phone field value of the particular record has been updated after executing the UPDATE query.

Check the replica node after the update:

# Select * from employees;

The following output shows that one record has been updated in the employees’ table of the replica node, which was updated in the primary node of the employees’ table. So, the changes in the main database have been replicated properly in the replica node.

Delete record in the primary node:

Run the following DELETE command that will delete a record from the employees’ table of the primary node where the value of the email field is “[email protected]”. There is only one record in the employees’ table that matches the condition of the DELETE query.

# DELETE FROM employees where email = <a href="mailto:'[email protected]">'[email protected]</a>';

Run the following command to check the current content of the employees’ table in the primary node.

# Select * from employees;

The following output shows that one record has been deleted after executing the DELETE query.

Check the replica node after deleting the record:

Now, you have to check whether the employees table of the replica node has been deleted or not. Login to the PostgreSQL server of the replica node and run the following command to check the content of the employees’ table.

# Select * from employees;

The following output shows that one record has been deleted in the employees’ table of the replica node, which was deleted in the primary node of the employees’ table. So, the changes in the main database have been replicated properly in the replica node.

Conclusion

The purpose of physical replication for keeping the backup of the database, the architecture of the physical replication, the advantages and disadvantages of the physical replication, and the steps of implementing physical replication in the PostgreSQL database have been explained in this tutorial with examples. I hope the concept of physical replication will be cleared for the users, and the users will be able to use this feature in their PostgreSQL database after reading this tutorial.

How to Setup Physical Replication With PostgreSQL

Physical or Streaming Replication

The Architecture of Physical Replication

Physical Replication Features

WAL Files

Hot, Cold, and warm standby

Recovery Mode

Advantages of Physical or Streaming Replication

Disadvantages of Physical or Streaming Replication

Physical Replication Setup

Setup the master and replica nodes

Install PostgreSQL on both master and replica nodes

Primary Node Configuration

Modify the postgresql.conf file

Create role/user for replication

Modify the pg_hba.conf file

Restart the PostgreSQL server

Replica Node Configuration

Stop the PostgreSQL server on the replica node

Remove the existing content of the replica node

Testing the Physical Replication Process

Copy the database to a replica server

Insert new records in the primary node:

Check the replica node after insertion:

Update record in the primary node:

Check the replica node after the update:

Delete record in the primary node:

Check the replica node after deleting the record:

Conclusion

About the author

Fahmida Yesmin