Apache Kafka

Apache Kafka KSQL Examples

KSQL is an open-source, streaming SQL engine for Apache Kafka that allows the users to write the SQL-like queries on Kafka topics and streams. It was developed by Confluent and is designed to simplify the process of processing the real-time data streams.

KSQL is built on top of Kafka Streams, a Java library for building stream processing applications that provides a high-level API for creating stream processors. However, KSQL offers a simpler and more familiar interface by providing an SQL-like syntax to define the stream processing logic.

In this tutorial, we will cover some basic KSQL example queries that you can use to work with Apache Kafka in an SQL-like manner.

KSQL Create Stream

To create a stream, we can use the CREATE STREAM statement.

For example, to create a stream named “info” with three columns (timestamp, userID, and link), we can use the following KSQL statement:

CREATE STREAM clicks (timestamp BIGINT, userID VARCHAR, link VARCHAR) WITH (kafka_topic='info-topic’, value_format='AVRO');

KSQL List Streams

To list all the streams in KSQL, we can use the following command:

SHOW STREAMS;

The command displays the list of all the available streams in your KSQL environment, along with their metadata such as the name of the stream, the Kafka topic, etc.

KSQL Create Table

To create a table in KSQL, we can use the CREATE TABLE statement. An example command is as follows:

CREATE TABLE info AS SELECT WINDOWSTART() AS hour, COUNT(*) AS count FROM clicks WINDOW TUMBLING (SIZE 1 HOUR) GROUP BY WINDOWSTART();

KSQL List Tables

To list all the tables in KSQL, we can use the following command:

SHOW TABLES;

Similarly, this command displays the list of all the available tables in your KSQL environment and the related metadata.

KSQL Filter Data

To filter data in a stream, use the SELECT statement with a WHERE clause. An example command is as follows:

SELECT * FROM info WHERE url = 'google.com';

This should return the records that match the specified criteria.

KSQL Aggregate Data

We can also perform the data aggregation using KSQL through the GROUP BY clause as shown in the following example:

SELECT userID, COUNT(*) FROM info GROUP BY userID;

KSQL Drop Table

To drop the existing table, use the query as follows:

DROP TABLE [IF EXISTS] table_name [DELETE TOPIC];

KSQL Pull Query

To select the data using a pull query, use the following command:

SELECT select_expr [, ...]
  FROM from_item
  [ WHERE where_condition ]
  [ AND window_bounds ]
  [ LIMIT count ];

KSQL Create Connector

To create a new connector in the Kafka Connect cluster with the provided configuration with a WITH clause, use the following command:

CREATE SOURCE | SINK CONNECTOR [IF NOT EXISTS] connector_name WITH( property_name = expression [, ...]);

KSQL Print Topic Contents

We can also print the contents of a Kafka topic using the print clause as shown in the following:

PRINT topicName [FROM BEGINNING] [INTERVAL | SAMPLE interval] [LIMIT limit]

KSQL List Topics

To show the available topics in a cluster, use the following command:

SHOW | LIST [ALL] TOPICS [EXTENDED];

Conclusion

We covered some common examples of using the KSQL language to query a Kafka cluster. Feel free to check out our tutorials on KSQL to learn more.

About the author

John Otieno

My name is John and am a fellow geek like you. I am passionate about all things computers from Hardware, Operating systems to Programming. My dream is to share my knowledge with the world and help out fellow geeks. Follow my content by subscribing to LinuxHint mailing list