Apache Cassandra

Cassandra Select

Apache Cassandra is one of the leaders in the wide-column NoSQL databases. In addition, it is a free and open-source solution for schemeless data. As a result, Cassandra is a popular choice in environments that need to handle extensive scale data with minimal latency and high-availability features.

Cassandra provides the Cassandra Query Language with a powerful query language for interacting with Cassandra keyspaces. CQL is closely similar to SQL and provides similar commands and syntax for similar operations. This makes Cassandra easily approachable by developers from relational databases.

In this tutorial, we will learn how to use the SELECT command in Cassandra. The SELECT command allows us to fetch the records from a Cassandra table.

Let’s dive in.

Command Syntax

The command follows a relatively simple syntax. The command syntax is as shown in the following snippet code:

SELECT * | select_expression | DISTINCT partition
FROM [keyspace_name.] table_name
[WHERE partition_value
[AND clustering_filters
[AND static_filters]]]
[ORDER BY PK_column_name ASC|DESC]
[LIMIT N]
[ALLOW FILTERING]

The command then returns the specified rows from the target table:

NOTE: Since Cassandra is designed to handle a large-scale data, it is not recommended to fetch all the rows from a table as you would in a relational database.

Sample Data

Before learning to select the data from a table using the SELECT statement, let’s set up a sample data for illustration.

Start by creating a sample keyspace:

CREATE KEYSPACE linuxhint WITH REPLICATION = { 'class': 'SimpleStrategy', 'replication_factor': 3, };

Switch the current keyspace using the following command:

USE linuxhint;

Create a sample table using the following command:

cqlsh:linuxhint> CREATE TABLE users( id int, username text, email text, active int, PRIMARY KEY(id));

Add a sample data as shown:

cqlsh:linuxhint> INSERT INTO users(id, username, email, active) VALUES (0, 'user1', '[email protected]', 1);
cqlsh:linuxhint> INSERT INTO users(id, username, email, active) VALUES (1, 'user2', '[email protected]', 1);
cqlsh:linuxhint> INSERT INTO users(id, username, email, active) VALUES (2, 'user3', '[email protected]', 0);
cqlsh:linuxhint> INSERT INTO users(id, username, email, active) VALUES (3, 'user4', '[email protected]', 0);
cqlsh:linuxhint> INSERT INTO users(id, username, email, active) VALUES (4, 'user5', '[email protected]', 1);

Examples

The following section provides the examples of performing the common select statements in a Cassandra cluster.

Example 1: Select All Record

In Cassandra, we can select all the records from a table using the query as shown:

cqlsh:linuxhint> select * from users;

We use the SELECT statement in this query to select all the user table columns.

The previous query should return the output as shown:

id | active | email         | username
----+--------+---------------+----------
1 |      1 | user2@mail.to |    user2
0 |      1 | user1@mail.to |    user1
2 |      0 | user3@mail.to |    user3
4 |      1 | user5@mail.to |    user5
3 |      0 | user4@mail.to |    user4

NOTE: As mentioned, this is not recommended for a large dataset.

Example 2: Select Specific Columns

We can select specific columns from a table by specifying the columns as comma-separated values.

For example, to select the id, username, and email columns, we can run the following command:

cqlsh:linuxhint> SELECT id, email, username FROM users;

In this case, the command should return an output as follows:

id | email         | username
----+---------------+----------
1 | user2@mail.to |    user2
0 | user1@mail.to |    user1
2 | user3@mail.to |    user3
4 | user5@mail.to |    user5
3 | user4@mail.to |    user4

(5 rows)

Example 3: Using Column Aliases

To make the output more readable, we can use the column aliases that give custom names to the output result.

For example:

cqlsh:linuxhint> SELECT id AS user_index, email AS email_address, username AS identifier FROM users;

The AS keyword in the query allows you to rename the column to an alias name without making permanent changes to the table.

For example, the previous query returns the result as:

user_index | email_address | identifier
------------+---------------+------------
1 | user2@mail.to |      user2
0 | user1@mail.to |      user1
2 | user3@mail.to |      user3
4 | user5@mail.to |      user5
3 | user4@mail.to |      user4

(5 rows)

Note that the id, email, and username columns are renamed to user_index, email_address, and identifier, respectively.

Example 4: Limiting the Number of Rows

Using the LIMIT option, we can also limit the number of records returned by the SELECT query. The LIMIT clause takes an integer representing the number of rows to retrieve.

For example, to fetch the first three rows, we can run the following command:

cqlsh:linuxhint> SELECT * FROM users LIMIT 3;

The resulting output is as follows:

id | active | email         | username
----+--------+---------------+----------
1 |      1 | user2@mail.to |    user2
0 |      1 | user1@mail.to |    user1
2 |      0 | user3@mail.to |    user3

(3 rows)

Example 5: Filtering Data

We can also fetch the data that matches a specific condition using the WHERE clause. This allows us to specify a condition that returns true/false and returns the matching values.

For example, to fetch all the rows where the value of the active column is equal to 1, we can do the following:

cqlsh:linuxhint> SELECT * FROM users WHERE active = 1 ALLOW FILTERING;

The previous query should return only the rows where the active column = 1.

id | active | email         | username
----+--------+---------------+----------
1 |      1 | user2@mail.to |    user2
0 |      1 | user1@mail.to |    user1
4 |      1 | user5@mail.to |    user5

(3 rows)

NOTE: The ALLOW FILTERING command allows Cassandra to use the data filtering features.

RECOMMENDATION: Only use the ALLOW FILTERING for development purposes only.

Conclusion

In this article, you learned the fundamentals of selecting the data from a Cassandra table using the CQL SELECT statement.

Thanks for reading!

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list