Batch processing is a very important feature when working with databases. Not only do they provide the developers a way to execute a series of instructions from a single script, but they also provide atomicity on the target cluster.
In this post, we will cover the basics of working with batch processing in Apache Cassandra. This allows you to combine a series of queries in a single execution context.
Let’s dive in.
Cassandra Batch Processing Syntax
The following shows the syntax of the batch statement in Cassandra:
[ USING TIMESTAMP [ epoch_microseconds ] ]
dml_statement [ USING TIMESTAMP [ epoch_microseconds ] ] ;
[ dml_statement [ USING TIMESTAMP [ epoch_microseconds ] ] [ ; ... ] ]
APPLY BATCH ;
You can use an INSERT, UPDATE, or DELETE statements in a batch clause.
The UNLOGGED statement defines whether the batch is logged or not. A logged batch ensures maximum atomicity by ensuring that all the statements in the batch are executed successfully. If any of the statements in a logged batch fails, the entire batch will fail to execute.
The USING TIMESTAMP statements are used to set the write time for the transactions that are carried out by the instructions in the batch. You can allow the cluster to assign a single timestamp on all the transactions carried out by the batch or you can specify the statements on which you wish to apply the timestamp.
An example is as shown:
statement_1;
…
statement_n;
APPLY BATCH ;
In the previous example, Cassandra creates a timestamp for all the transactions made by the specified statements in the batch.
Another example is as shown:
statement_1;
statement_2 USING TIMESTAMP [ epoch_microseconds ] ;
statement_n;
APPLY BATCH ;
In this case, Cassandra only applies the timestamps for the transactions made by the statement_2.
Example:
The following example illustrates how to use a batch statement using an INSERT DML statement:
... id int,
... username text,
... primary key(id));
Once we have the table setup, we can run a batch insert as follows:
The query should perform a batch insert into the specified table. Since we set the timestamp on all the statements, the query should ensure that all the records share a similar timestamp.
Conclusion
In this article, we covered the basics of working with batch mode in Apache Cassandra. It is good to keep in mind that it is a basic tutorial. There is a lot more to consider when working with batch processing such as single and multiple partitions, target keyspaces, performance implication, asynchronous statements, and more. We recommend checking out the documentation for a detailed information.