Apache Cassandra

Cassandra Show Snapshots

Backups are incredible features, especially when working with data-critical environments. In Apache Cassandra, we can create backups of database data stored as SSTable files. You can then use the backup files to restore the database in the case of data loss, node, or partition failure. Backups can also be used to replicate the database in another machine, removing the need to recreate the structure from scratch.

Cassandra supports two main types of backups:

  1. Snapshots
  2. Incremental Backups

In this tutorial, we will focus on snapshot backups. First, we learn how we can initialize and create the database backups stored in an Apache Cassandra cluster.

Let’s dive in.

What Are Snapshots?

In the context of an Apache Cassandra cluster, a snapshot refers to a copy of a table’s SSTable files at a specific time. SSTable or Sorted Strings Table is a file format that Apache Cassandra uses to store the in-memory data in memtables for fast access. SSTable files are immutable and are removed or merged with new SSTable files as the data changes.

Snapshots in Cassandra can be issued manually by the user or automated by enabling the feature in the configuration files.

Setting Up a Sample Data to Illustrate the Snapshots in Cassandra

Before illustrating how to perform the snapshots in Cassandra, let us create some sample data to demonstrate how to create snapshots.

Let’s start by creating a snapshot.

cassandra@cqlsh> CREATE KEYSPACE snapshotting
... WITH REPLICATION = {'class': 'SimpleStrategy', 'replication_factor': 1};

The previous query creates a keyspace with the SimpleStrategy and replication factor of 1.

We can then switch to that keyspace and create two tables:

cassandra@cqlsh> USE snapshotting;

Next, create a users’ table as follows:

cassandra@cqlsh:snapshotting> CREATE TABLE users(
... id int,
... username text,
... email text,
... PRIMARY KEY(id)
... );

We can also create another table that is called with a similar structure:

cassandra@cqlsh:snapshotting> CREATE TABLE users_copy(
... id int,
... username text,
... email text,
... PRIMARY KEY(id)
... );

Finally, we can add some sample data to the table as shown:

INSERT INTO users (id, username, email) VALUES (0, 'username1', '[email protected]');
INSERT INTO users (id, username, email) VALUES (1, 'username2', '[email protected]');
INSERT INTO users_copy (id, username, email) VALUES (0, 'username1', '[email protected]');
INSERT INTO users_copy (id, username, email) VALUES (1, 'username2', '[email protected]');

We can then query the tables as follows:

cassandra@cqlsh:snapshotting> SELECT * FROM users;

cassandra@cqlsh:snapshotting> SELECT * FROM users_copy;

Output:

Configure the Cassandra Cluster for Snapshots

Before creating any snapshots, it’s good to ensure that automatic snapshot creation is disabled. Edit the cassandra.yml file and set the following value:

auto_snapshot: false

It’s also recommended to disable the automatic compaction before the snapshot creation. In the cassandra.yml file, set the following value:

snapshot_before_compaction: false

Once the given settings are ready, restart your Cassandra cluster to apply the changes.

Taking Snapshots of All Keyspaces

When manually creating snapshots in Cassandra, we use the nodetool command. You can run the following command:

$ nodetool help snapshot

To view the available command options.

To take a snapshot of all the keyspaces in the Cassandra cluster, we can run the following command:

$ nodetool snapshot

The command should return a message as shown:

By default, Cassandra creates a snapshot with the name of the current timestamp.

To specify the name of the snapshot, you can use the -t option as shown in the following command:

$ nodetool snapshot -t backups

This creates a snapshot of all the keyspaces in the cluster and store it in the backups directory.

Taking a Snapshot of a Single Keyspace

You can also take a backup of a single keyspace in the cluster by specifying the keyspace name. For example, to take a snapshot of the snapshotting keyspace that we created earlier, we can run the following command:

$ nodetool snapshot -t snapshotting_backup snapshotting

Cassandra creates a snapshot directory for each table in the specified keyspace. For example, since the “snapshotting” keyspace contains two tables, Cassandra creates a snapshot directory for each.

By default, Cassandra stores the snapshots in the /var/lib/cassandra/data directory.

$ ls -la /var/lib/cassandra/data/snapshotting/

You should see directories of each table in the keyspace.

Inside each file, you will find the other files and directories as shown:

Taking a Snapshot of a Single Table Within a Keyspace

Sometimes, you may want to take a snapshot of a specific table within a given keyspace. For that, you can use the –table option followed by the name of the table that you wish to backup.

For example, to take a snapshot of the users_copy table in the “snapshotting” keyspace, we can run the following command:

$ nodetool snapshot --table users_copy -t uc_snap snapshotting

The command creates a snapshot of the users_copy table and store it under the uc_snap directory.

Listing Snapshots

To view the available snapshots in the cluster, use the listsnapshot command as shown:

$ nodetool listsnapshots

You should get a list of all the available snapshots and details such as the snapshot name, to which keyspace they belong, the column family name, the size on disk, and the actual size.

Snapshot Details:
Snapshot name       Keyspace name      Column family name             True size Size on disk
uc_snap             snapshotting       users_copy                     0 bytes   5.87 KiB
1661397218984       system_schema      columns                        0 bytes   12.51 KiB
1661397218984       system_schema      types                          0 bytes   15.03 KiB
1661397218984       system_schema      indexes                        0 bytes   15.15 KiB
1661397218984       system_schema      keyspaces                      0 bytes   5.81 KiB
1661397218984       system_schema      dropped_columns                0 bytes   15.63 KiB
1661397218984       system_schema      aggregates                     0 bytes   15.4 KiB
1661397218984       system_schema      triggers                       0 bytes   15.15 KiB
1661397218984       system_schema      tables                         0 bytes   10.27 KiB
1661397218984       snapshotting       users                          0 bytes   5.86 KiB
1661397218984       snapshotting       users_copy                     0 bytes   5.87 KiB
snapshotting_backup snapshotting       users                          0 bytes   5.86 KiB
snapshotting_backup snapshotting       users_copy                     0 bytes   5.87 KiB
backups             snapshotting       users                          0 bytes   5.86 KiB
backups             snapshotting       users_copy                     0 bytes   5.87 KiB
1661397899477       snapshotting       users_copy                     0 bytes   5.87 KiB
Total TrueDiskSpaceUsed: 0 bytes

Conclusion

In this article, you learned how snapshotting works in Apache Cassandra. You also learned how to take snapshots of keyspaces, specific tables within a keyspace, and more.

Thanks for reading!

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list