Apache Cassandra

Cassandra Aggregates

“If you are coming from SQL databases, you are familiar with an aggregate function. These functions allow one to perform an aggregate operation on a select statement and return the result.

An aggregate function takes a set of values, performs a specific function, and returns a single value. An example includes calculating the average of a given set of values.

Cassandra provides us with a set of native functions that allows us to perform common aggregation operations. Join us in this tutorial as we explore these types of functions and how we can use them.”

Aggregate Function Syntax

The following shows the simple syntax of a Cassandra aggregate function:

aggregate_name(column_name)

The aggregate function will then take the values of the specified column, perform the operation and return the resulting value.

Cassandra Aggregate Functions

Cassandra supports the following aggregate functions:

AVG Function

The AVG function allows you to calculate the average value of the given column. The function syntax is as shown:

AVG(column_name)

The function will ignore any NULL values in the column. However, the function supports data types such as int, double, float, biging, tinyint, smallint, and decimal.

COUNT Function

As the name suggests, this function returns the number of records that do not contain null values. The function syntax is as shown:

COUNT(column_name)

The function will work on Cassandra’s data types.

MIN Function

This function will return the smallest value in a given column. The function syntax is as shown:

MIN(column_name)

Keep in mind that this function will only work on numerical data types.

MAX Function

This function will return the largest value in a given column. Similarly, the function will work on numerical data types and ignore null values.

The function syntax is as shown:

MAX(column_name)

SUM Function

Finally, the sum function will return the sum of numerical values in a given column.

The function syntax is as shown:

SUM(column_name)

Let us now look at examples of how to use these functions.

Suppose we have a table as shown:

create table products(
    id int,
    product_name text,
    price int,
    qty int,
    primary key(id)
);
begin batch using timestamp 1664053851
    insert into products(id, product_name, price, qty)
    values (1, 'product_1', 100, 5640);

    insert into products(id, product_name, price, qty)
    values (2, 'product_2', 800, 550);

    insert into products(id, product_name, price, qty)
    values (3, 'product_3', 500, 5550);

    insert into products(id, product_name, price, qty)
    values (4, 'product_3', 150, 540);

    insert into products(id, product_name, price, qty)
    values (5, 'product_4', 160, 260);

    insert into products(id, product_name, price, qty)
    values (6, 'product_5', 130, 5640);
apply batch;

Finding Average Value

We can use the avg function to calculate the average price of a product in the table as shown in the query below:

select avg(price) from products where product_name = ‘product_1’;

Find the Number of Rows

To determine the number of non-null records, run the query:

select count(product_name) from products;

Output:

system.count(product_name)
----------------------------
                          6
(1 rows)

Find Minimum Value

To determine the lowest price from the products table, run the following:

select product_name, min(price) from products;

Find Maximum Value

To find out the most expensive product on the list, run:

select product_name, max(price) from products;

Finding Total

Calculate the total price:

select  sum(price) from products;

Conclusion

Aggregates are some of the most important features that Cassandra allows you to include in your cluster. Instead of manually performing recurring operations, you can define an aggregate that can then be recalled on multiple objects or even used in a different cluster. We hope this tutorial gave you a first foot into Cassandra aggregates.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list