Apache Cassandra

Apache Cassandra Batch

Batch processing is a very important feature when working with databases. Not only do they provide the developers a way to execute a series of instructions from a single script, but they also provide atomicity on the target cluster.

In this post, we will cover the basics of working with batch processing in Apache Cassandra. This allows you to combine a series of queries in a single execution context.

Let’s dive in.

Cassandra Batch Processing Syntax

The following shows the syntax of the batch statement in Cassandra:

BEGIN [ ( UNLOGGED | COUNTER ) ] BATCH
[ USING TIMESTAMP [ epoch_microseconds ] ]
dml_statement [ USING TIMESTAMP [ epoch_microseconds ] ] ;
[ dml_statement [ USING TIMESTAMP [ epoch_microseconds ] ] [ ; ... ] ]
APPLY BATCH ;

You can use an INSERT, UPDATE, or DELETE statements in a batch clause.

The UNLOGGED statement defines whether the batch is logged or not. A logged batch ensures maximum atomicity by ensuring that all the statements in the batch are executed successfully. If any of the statements in a logged batch fails, the entire batch will fail to execute.

The USING TIMESTAMP statements are used to set the write time for the transactions that are carried out by the instructions in the batch. You can allow the cluster to assign a single timestamp on all the transactions carried out by the batch or you can specify the statements on which you wish to apply the timestamp.

An example is as shown:

BEGIN BATCH USING TIMESTAMP [ epoch_microseconds ]
statement_1;

statement_n;
APPLY BATCH ;

In the previous example, Cassandra creates a timestamp for all the transactions made by the specified statements in the batch.

Another example is as shown:

BEGIN BATCH
statement_1;
statement_2 USING TIMESTAMP [ epoch_microseconds ] ;
statement_n;
APPLY BATCH ;

In this case, Cassandra only applies the timestamps for the transactions made by the statement_2.

Example:

The following example illustrates how to use a batch statement using an INSERT DML statement:

cassandra@cqlsh:testing> create table users(
... id int,
... username text,
... primary key(id));

Once we have the table setup, we can run a batch insert as follows:

cassandra@cqlsh:testing> begin batch using timestamp 1664050149 insert into users (id, username) values (1, 'username1'); insert into users (id, username) values (2, 'username2'); apply batch;

The query should perform a batch insert into the specified table. Since we set the timestamp on all the statements, the query should ensure that all the records share a similar timestamp.

Conclusion

In this article, we covered the basics of working with batch mode in Apache Cassandra. It is good to keep in mind that it is a basic tutorial. There is a lot more to consider when working with batch processing such as single and multiple partitions, target keyspaces, performance implication, asynchronous statements, and more. We recommend checking out the documentation for a detailed information.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list