There are two common approaches to scaling a server: vertical scaling and horizontal scaling. Vertical scaling or scaling up is where you add more power and resources to your server, such as more CPUs, memory, and storage, which is costly. On the other hand, horizontal scaling is adding multiple nodes to your existing resource pool. This is called scaling out. So, based on your limitations and requirements, it is up to you to have a single bigger server instance or deploy multiple server nodes.
Assume you have 100 GB of RAM and need to hold 200 GB of data. In this case, you have two choices:
- Scale up by adding more RAM to the system
- Scale out by adding another server instance with 100 GB of RAM
If you have reached the maximum RAM limit within your infrastructure, then scaling out is the ideal approach. In addition, scaling out will increase the database throughput by a huge margin.
It is a known fact that Redis operates on a single thread. So, Redis is not capable of utilizing multiple cores of your server’s CPU to process commands. Therefore, adding more CPU cores doesn’t give you much throughput or performance with Redis. It is not the case with splitting your data among multiple server instances. Adding several servers and distributing the data set among those enable the processing of client requests parallel, which increases the throughput. In addition, the overall performance may increase close to linearly.
This approach of splitting or distributing data among multiple servers with scaling in mind is called sharding. All the servers that store portions of data are called shards.
How Sharding Is Done — Algorithmic Sharding
One of the major concerns with sharding was how to locate a given key among multiple Redis nodes. Because a given key can be stored in any available shards, querying all shards to find a specific key is not the best option. So, there should be a way to map each key to a specific shard, and Redis uses an Algorithmic sharding strategy.
The most common approach is to calculate a hash value using the Redis key name and modulo. Then, divide it by the available Redis shards in the system.
It is quite a good solution as long as the total number of shards is constant. Whenever you add a new Reids server instance, the resulting value for a given key may change since the total number of shards has increased. It will end up querying the wrong Redis shard. Hence, you should follow the resharding process by calculating the new shard for each key and transferring data to the correct server, which is cumbersome and not a trivial task if your total shard count is increasing from time to time.
Redis uses a new logical entity called a hash slot to prevent this problem. Several hash slots are available for a given shard, and a single hash slot can hold multiple Redis keys. There are 16384 hash slots in a Redis database cluster which remains unchanged. The modulo division is done with the number of hash slots instead of the shard count. It provides the correct position of the hash slot for the specified key even when the number of shards has increased. It simplifies the resharding process by moving the hash slots from one shard to the new one that splits data across the different Redis instances as per requirement.
Benefits of Redis Sharding
Redis sharding enables several benefits to your database system with minimal changes.
Since Redis is single-threaded, processing multiple client requests can’t process parallel using multiple CPU cores. So, adding new shards or server instances guarantees that you can perform Redis operations in parallel. It increases the operations per second in your Redis database, which eventually gives you high throughput.
With the sharding approach, the Redis cluster can set up a master-replica architecture that ensures high availability and durability.
Sharding enables you to keep an exact copy of your data and provide read operations through separate Redis instances, which increases the performance of your read query execution.
Apart from these benefits, sharding may cause split-brain situations when you have an even number of shards in the Redis cluster. So, keeping an odd number of shards in your Redis cluster is recommended.
To summarize, Redis sharding is splitting data among multiple servers, which enables scaling and high throughput for your database. As discussed, Redis uses an algorithmic sharding strategy to point client requests to the correct shard. This has some drawbacks when the total number of shards increases. So, instead of the total number of shards, Redis uses the number of hash slots to calculate the appropriate shard. With sharding introduced, Redis databases provide high availability, high throughput, and high performance.