Apache Kafka

Add Node to Apache Kafka Cluster

An Apache Kafka cluster refers to a set of multiple servers that run the Apache Kafka and work together to manage the incoming and outgoing data streams for a distributed architecture.
The servers in the cluster are coordinated and communicate with each other to manage the data replication, failover, and balancing of the data streams for high availability and scalability.

The cluster combines all the resources and functionality of the servers to provide a comprehensive publish-subscribe messaging system to stream the data. It enables the parallel processing of the data by multiple consumers.

On the other hand, a Kafka node refers to a single instance of the Apache Kafka software that runs on a machine. A node can act as a broker, a server that stores and manages the incoming data streams, or as a client, which can publish or subscribe to the data streams.

In a Kafka cluster, each node plays a specific role in managing the data and communicating with other nodes in the cluster. A node can also run multiple broker processes, known as broker instances, to increase the capacity and reliability of the cluster.

As your application grows and you need more resources, you may need to add a node to an existing cluster. This is a standard technique in scaling an existing Kafka cluster to accommodate an increased data volume or to improve its fault tolerance.

In this tutorial, you will learn how to add a node to an existing cluster by configuring the node as a Kafka broker and updating the Kafka cluster configuration.

Step 1: Installing the Apache Kafka

The first step to set up a new Kafka node is installing the Kafka packages on the local machine. Installing Kafka is straightforward. You can check how to install and configure Kafka on Debian and Ubuntu.

Once configured, we can proceed to our next step.

Step 2: Setting up the Node Broker ID

A broker ID refers to a unique identifier which is assigned to each broker in a cluster. It is used to identify each broker in the cluster and tracks which broker is responsible for which data partitions.

To add a new node to an existing cluster, we need to configure the server as a broker by assigning it with a unique broker ID.

NOTE: When adding a new node to an existing cluster, it is crucial to specify a unique broker ID for the new node to avoid conflicts with existing nodes in the cluster.

We can specify the broker ID in the broker’s configuration file. The broker ID is a non-negative integer.

Open the terminal and edit the Kafka configuration file for the server that you wish to add to the cluster:

$ sudo nano /opt/kafka/config/server.properties

Locate the “Server Basic” block and change the “broker.id” property to the unique ID of your node.

broker.id=3

In this case, we specify the broker ID to be 3.

Step 3: Configure the Node’s Address

Next, we need to update the cluster configuration, the “server.properties” file on each of the existing nodes to include the new node’s address in the “advertised.listeners” property. This allows the new node to join the cluster.

An example is as follows:

# Node 1
advertised.listeners=PLAINTEXT://192.168.0.100:9092
# Node 2
advertised.listeners=PLAINTEXT://192.168.0.101:9092
# Node 3
advertised.listeners=PLAINTEXT://192.168.0.102:9092

In the given example, each node advertises its listener address, the IP address, and the port number to other nodes in the cluster that is used to communicate with it.

The listener protocol is specified as PLAINTEXT for unencrypted communication. The IP address and port number must be accessible by other cluster nodes and set correctly to ensure the proper communication between the nodes.

Step 4: Start the New Node

Once you configured the node addresses, start the Kafka and Zookeeper server on the new node. This allows the node to join the cluster and accept communications.

You can verify that the node is added to the cluster by running the following command:

$ kafka-topics.sh --describe --bootstrap-server

Example Output:

Topic: sample_d     PartitionCount: 3    ReplicationFactor: 3    Configs:
    Topic: sample_d     Partition: 0    Leader: 1    Replicas: 1,2,3    Isr: 1,2,3
    Topic: sample_d     Partition: 1    Leader: 2    Replicas: 2,3,1    Isr: 2,3,1
    Topic: sample_d     Partition: 2    Leader: 3    Replicas: 3,1,2    Isr: 3,1,2

The “Leader” column shows the broker ID of the node which is currently acting as the leader for each partition, and the “Replicas” column lists the broker IDs of the nodes that store a replica of the partition.

After adding the node to the cluster, you may need to perform other actions such as partition assignment rebalancing, etc. Check the following resource to learn more:

https://kafka.apache.org/documentation/#basic_ops_cluster_expansion

Conclusion

We discussed how to add a node to an existing cluster by setting up the node as a broker.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list