- For reformatting the source code
- For cleaning up data
- For simplifying the command-line output
If we talk about leading whitespaces, they are relatively easy to spot as they are at the start of the text. However, it is not easy to spot the trailing whitespaces. The same is the case with double spaces which are also sometimes difficult to spot. It all becomes more challenging when you need to remove all those leading and trailing whitespaces from a document containing thousands of lines.
To remove whitespaces from your document, you can use various tools such as awk, sed, cut, and tr. In some other articles, we have discussed the use of awk in removing the whitespaces. In this article, we will be discussing the use of sed for removing whitespaces from the data.
You will learn how to use sed to:
- Remove all white spaces
- Remove leading whitespaces
- Remove trailing whitespaces
- Remove both leading and trailing whitespaces
- Replace multi spaces with single space
We will be running the commands on Ubuntu 20.04 Focal Fossa. You can also run the same commands on other Linux distributions. We will use the default Ubuntu Terminal application for running the commands. To open the Terminal, use the Ctrl+Alt+T keyboard shortcut.
What is Sed
Sed (stands for stream editor) is a very powerful and handy utility in Linux that allows us to perform basic text manipulations on the input streams. It is not a text editor, but it helps to manipulate and filter text. It receives the input streams and edits it according to the user’s instructions and then print the transformed text to the screen.
With sed, you can:
- Select text
- Search text
- Insert text
- Replace text
- Delete text
Using Sed to Remove Whitespaces
We will use the following syntax for removing whitespaces from the text:
Where
- s/: is substitution expression
- REGEXP: is a regular expression to match
- replacement: is the replacement string
- flags: We will only use the “g” flag to enable substitution globally on each line
Regular expressions
Some of the regular expressions we will use here are:
- ^ matches start of the line
- $ matches the end of line
- + matches one or more occurrences of the preceding character
- * matches zero or more occurrences of the preceding character.
For demonstration purpose, we will use the following sample file named “testfile”.
View All Whitespaces in a File
To find all the whitespaces in your file, pipe the output of cat command to the tr command like this:
This command replaces all the whitespaces in your file by (*) symbol, which makes it easier to spot all the whitespaces whether they are single, multiple, leading, or trailing whitespaces.
In the following screenshot, you can see the whitespaces are replaced by * symbol.
Remove All Whitespaces (Including Spaces and Tabs)
In some cases, you need to remove all whitespaces from the data, i.e. leading, trailing, and the whitespaces between the texts. The following command will remove all the whitespaces from the “testfile”.
Note: Sed does not alter your files unless you save the output to the file.
Output:
After running the above command, the following output appeared, which shows that all the whitespaces have been removed from the text.
You can also use the following command to verify that all whitespaces have been removed.
From the output, you can see that is no (*) symbol which means all the whitespaces have been removed.
To remove all whitespaces but only from a specific line (let’s say line number 2), you can use the following command:
Remove All Leading Whitespaces (Including Spaces and Tabs)
To remove all the whitespaces from the beginning of each line (leading whitespaces), use the following command:
Output:
The following output appeared after running the above command, which shows all the leading whitespaces have been removed from the text.
You can also use the following command to verify that all the leading whitespaces have been removed:
From the output, you can see there is no (*) symbol at the beginning of the lines which verifies that all the leading whitespaces are removed.
To remove the leading whitespaces from only a specific line (let’s say line number 2), you can use the following command:
Remove All Trailing Whitespaces (Including Spaces and Tabs)
To remove all the whitespaces from the end of each line (trailing whitespaces), use the following command:
Output:
The following output appeared after running the above command, which shows all the trailing whitespaces have been removed from the text.
You can also use the following command to verify that all trailing whitespaces have been removed.
From the output, you can see there is no (*) symbol at the end of the lines which verifies that all the trailing whitespaces are removed.
To remove the trailing whitespaces from only a specific line (let’s say line number 2), you can use the following command:
Remove both Leading and Trailing Whitespaces
To remove all the whitespaces from both the start and end of each line (i.e. both leading and trailing whitespaces), use the following command:
Output:
The following output appeared after running the above command, which shows that both the leading and trailing whitespaces have been removed from the text.
You can also use the following command to verify that both the leading and trailing whitespaces have been removed.
From the output, you can see there is no (*) symbol at the start or end of the lines which verifies that all the leading and trailing whitespaces are removed.
To remove both the leading and trailing whitespaces from only a specific line (let’s say line number 2), you can use the following command:
Replace Multiple Whitespaces with Single Whitespace
In some cases, there are multiple whitespaces at the same place in the file, but you only need single whitespace. You can do so by replacing those multiple spaces by a single space using sed.
The following command will replace all multiple whitespaces with single whitespace from each line in the “testfile”.
Output:
The following output appeared after running the above command, which shows the multiple whitespaces have been replaced with the single whitespace.
You can also use the following command to verify if multiple whitespaces are replaced with single whitespace:
From the output, you can see the single (*) symbol at each place which verifies that all the occurrences of the multiple whitespaces are replaced with a single whitespace.
So, this was all about removing the whitespaces from your data using sed. In this article, you have learned how to use sed to remove all whitespaces from your data, remove only the leading, or trailing whitespace, and remove both leading and trailing whitespace. You have also learned how to replace multi spaces with a single space. It will now be easy for you to remove whitespaces from a file containing hundreds or thousands of lines.