AWS

Redshift SUBSTRING Function

Redshift SUBSTRING is the function used to extract the required substring from the existing string based on the start position and the length of the substring. The purpose of the SUBSTRING function is to return the specific portion of the string. Database developers widely use this function in queries to get a specific portion of the string. This blog describes how the SUBSTRING function can be used in different ways to query data from the redshift cluster.

SUBSTRING Function Syntax

The basic and most commonly used syntax for the SUBSTRING function is as follows, where character_string is the main string from which we will extract substrings. The starting_position is the position of the first character of the substring in the main string, and the length_of_characters is the number of characters in the substring:

1
substring (character_string, starting_position, length_of_characters);

OR

1
substring (character_string FROM start_position [ FOR number_characters ]);

Also, you can extract a substring of binary data using the following syntax, where binary_expression is the binary data with data type VARBYTE. The starting_byte is the position of the first byte of the substring extracted from binary data, and the number_bytes is the number of bytes the substring will include.

If you do not specify the number_bytes in the SUBSTRING function, it will extract the substring starting from starting_byte to the end of the binary data.

1
substring(binary_expression, starting_byte, number_bytes );


OR

1
substring(binary_expression, starting_byte );

Using SUBSTRING Function

So far, we have discussed the syntax of the SUBSTRING function. In this section, we will see how this function can be used. Let us discuss some examples to understand the working of the SUBSTRING function:

The SUBSTRING function can be used to query data from the redshift cluster with the SELECT queries. This section will use the SUBSTRING function with the SELECT query.

The following query will return the REDSHIFT when executed on redshift:

1
select substring('AWS REDSHIFT',5,8);

The following table shows the character count in the AWS REDSHIFT string. The space between AWS and REDSHIFT is also counted as a character.

A W S R E D S H I F T
1 2 3 4 5 6 7 8 9 10 11 12

The starting position is 5, and the character at the 5th count is R. Length of the character is 8, which means starting from the letter, R, eight characters will be returned.


Similarly, the following query will return AWS when executed on redshift. The first character is A, and the total substring count is 3; that is why it will return the AWS substring.

1
select substring('AWS REDSHIFT',1,3);

If the starting_position+length_of_characters is greater than the total number of characters in the character_string, the redshift will return the value starting from starting position to the end of the string.

Also, if the length of the character is not specified, the substring function returns the value starting from starting position to the end of the string.

1
select substring('AWS REDSHIFT',7,8);

OR

1
select substring('AWS REDSHIFT',7);

In either case, the output will be DSHIFT, as shown in the figure below:


If the starting position is zero or less than zero(negative) value, the substring function will return a substring beginning at the first character of the character_string with a length of starting_position+length_of_characters-1.

1
select substring('AWS REDSHIFT',-3,7);

The output will be AWS, as shown in the figure below:


If the starting_position+length_of_characters-1 is less than or equal to zero, the SUBSTRING function will return an empty string.

1
select substring('AWS REDSHIFT',-5,4);

The output for this query will be empty, as shown in the figure below:


The following query will create the AWS_Redshift_substring table into the redshift cluster with the specified attributes:

1
2
3
4
5
6
7
create table AWS_Redshift_substring(

     S_no int,

     PersonName VARCHAR(255)

);

Insert data into the table using the INSERT method.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
insert into AWS_Redshift_substring values (1, 'Mr. Dwayne Johnson');

insert into AWS_Redshift_substring values (2, 'Mr. Vin Diesel');

insert into AWS_Redshift_substring values (3, 'Mr. Will Smith');

insert into AWS_Redshift_substring values (4, 'Mr. Robert Downey Jr.');

insert into AWS_Redshift_substring values (5, 'Mr. Chris Hemsworth');

insert into AWS_Redshift_substring values (6, 'Mr. Leonardo DiCaprio');

insert into AWS_Redshift_substring values (7, 'Mr. Ryan Reynolds');

insert into AWS_Redshift_substring values (8, 'Mr. Chris Pratt');

insert into AWS_Redshift_substring values (9, 'Mr. Hugh Jackman');

insert into AWS_Redshift_substring values (10, 'Mr. Mark Wahlberg');

The previous queries will insert data into the table. Now, select the values from the table by using the following query:

1
select * from aws_redshift_substring;
s_no personname
1 Mr. Dwayne Johnson
2 Mr. Vin Diesel
3 Mr. Will Smith
4 Mr. Robert Downey Jr.
5 Mr. Chris Hemsworth
6 Mr. Leonardo DiCaprio
7 Mr. Ryan Reynolds
8 Mr. Chris Pratt
9 Mr. Hugh Jackman
10 Mr. Mark Wahlberg

Now, apply some substring functions on this table. We extract only the names, neglecting the title. For this, run the following query:

1
select *, SUBSTRING(PersonName , 4) AS new_name from aws_redshift_substring;

This query will return the substrings of the personname column into a new column.

s_no personname new_name
1 Mr. Dwayne Johnson Dwayne Johnson
2 Mr. Vin Diesel Vin Diesel
3 Mr. Will Smith Will Smith
4 Mr. Robert Downey Jr. Robert Downey Jr.
5 Mr. Chris Hemsworth Chris Hemsworth
6 Mr. Leonardo DiCaprio Leonardo DiCaprio
7 Mr. Ryan Reynolds Ryan Reynolds
8 Mr. Chris Pratt Chris Pratt
9 Mr. Hugh Jackman Hugh Jackman
10 Mr. Mark Wahlberg Mark Wahlberg


Similarly, we can get substrings of personname using the following SELECT query along with the SUBSTRING function:

1
select *, SUBSTRING(PersonName from 1 for 3) AS title from aws_redshift_substring;

This will extract the only title while neglecting the full name.

S_no personname title
1 Mr. Dwayne Johnson Mr.
2 Mr. Vin Diesel Mr.
3 Mr. Will Smith Mr.
4 Mr. Robert Downey Jr. Mr.
5 Mr. Chris Hemsworth Mr.
6 Mr. Leonardo DiCaprio Mr.
7 Mr. Ryan Reynolds Mr.
8 Mr. Chris Pratt Mr.
9 Mr. Hugh Jackman Mr.
10 Mr. Mark Wahlberg Mr.

Conclusion

This article discussed how we could use the SUBSTRING function to run queries in the redshift cluster. Extracting modified data from the database is a big task and can be performed using the SUBSTRING function. Using the SUBSTRING function can help avoid post query data modification as it gets the modified data.

About the author

Zainab Rehman

I'm an author by profession. My interest in the internet world motivates me to write for Linux Hint and I'm here to share my knowledge with others.