C++

C++ String trim Methods

Trimming a string means removing white spaces in front and behind the string. The next question is, what are white spaces? The following is a list of white spaces in a string:

  • ‘ ‘ or ‘\040’: space by pressing the spacebar key
  • ‘\n’: line feed
  • ‘\r’: carriage return
  • ‘f’: form feed
  • ‘\t’: horizontal tab
  • ‘\v’: vertical tab

C++ does not have a function to trim a string. There is a subject in computer programming called, Regular Expressions, abbreviated regex. This subject has schemes, which enable the programmer to search for a sub-string in a target string and replace the sub-string found. The sub-string found can be replaced with nothing, and so erasing it.

The search-and-replace with nothing idea can be used to trim a string. So look for all white space characters in front of the string and all white-space characters behind the string, and replace them with nothing. Luckily, C++ has a regex library, which has to be included in the program to do this.

Article Content

Summary of Regular Expressions

Regex
Consider the string:

    "This is it for the show"

The first four characters of this string form the sub-string, “This”. The last four characters of the string form the last sub-string, “show”.

Now, the whole string is called the target string or simply target. The sub-string “This” or “show” is called the regular expression or simply, regex.

Matching
If “This” is searched for and located in the target, then matching is said to have occurred. If “show” is searched for and located, then matching is still said to have occurred. Matching occurs for any target string when a sub-string is found. The sub-string can be replaced. For example, “This” can be replaced with “Here” and “show” can be replaced with “game” to have the new target,

    "Here is it for the game"

If the first and last words were not wanted at all, then they could be replaced with nothing, to have,

    " is it for the "

This last result happens to be an unconventional trimming, which unfortunately still ends with one space at the beginning, and another space at the end.

Pattern
A blunt sub-string (“This” or “show”), as illustrated above, is a simple pattern. Consider the following target:

    "Hey, that is a bat on the middle of the road."

The programmer may want to know if it is a rat, cat, or bat since these three words are similar in sound. He needs a pattern to identify the word “cat” or “rat” or “bat”. Notice that each of these words ends with “at” but begins with ‘b’ or ‘c’ or ‘r’. The pattern, to match any of these three words, is

    [bcr]at

This means, match ‘b’ or ‘c’ or ‘r’, followed by “at”.

Repetition
x* :   means match ‘x’ 0 or more times, i.e., any number of times.

Matching Examples
The following program produces a match for “bat” in the target string, using the regex object, reg(“[bcr]at”), whose pattern is [bcr]at.

    #include <iostream>
    #include <regex>
    using namespace std;

    int main()
    {

        regex reg("[bcr]at");
        if (regex_search("Hey, that is a bat on the middle of the road.", reg))
            cout << "matched" << endl;
        else
            cout << "not matched" << endl;

        return 0;
    }

The output is: matched.

The regex library is included with “#include <regex>”. The regex object is instantiated with the statement,

    regex reg("[bcr]at");

[/cc]

The regex_search() function from the library takes two arguments here. The first one is the target string. The second one is the regex object. The pattern, [bcr]at matched “bat” and so the regex_search() function returned true. Otherwise, it would have returned, false.

The following program illustrates a match of the pattern, bo*k for “book”:

    #include <iostream>
    #include <regex>
    using namespace std;

    int main()
    {

        regex reg("bo*k");
        if (regex_search("the book is good.", reg))
            cout << "matched" << endl;
        else
            cout << "not matched" << endl;

        return 0;
    }

The output is: matched. o* means, match ‘o’, zero or more times. It actually matched ‘o’, two times in “book”.

Matching Beginning of Target String
To match the beginning of a target string, the pattern has, to begin with, ^ . The following program matches “This” at the beginning of the target string, “This is it for the show”.

    #include <iostream>
    #include <regex>
    using namespace std;

    int main()
    {
        regex reg("^This");
        if (regex_search("This is it for the show", reg))
            cout << "matched" << endl;
        else
            cout << "not matched" << endl;

        return 0;
    }

The output is: matched. Notice the regex literal, "^This" .

Matching End of Target String
To match the end of a target string, the pattern has to end with $. The following program matches “show” at the end of the target string, “This is it for the show”.

    #include <iostream>
    #include <regex>
    using namespace std;

    int main()
    {

        regex reg("show$");
        if (regex_search("This is it for the show", reg))
            cout << "matched" << endl;
        else
            cout << "not matched" << endl;

        return 0;
    }

The output is: matched. Notice the regex literal, "show$" .

Matching Alternatives
To match the beginning sub-string or the end sub-string, the | meta-character has to separate the beginning and end patterns in the overall pattern. The following program illustrates this:

    #include <iostream>
    #include <regex>
    using namespace std;

    int main()
    {

        regex reg("^This|show$");
        if (regex_search("This is it for the show", reg))
            cout << "matched" << endl;
        else
            cout << "not matched" << endl;

        return 0;
    }

The output is: matched. Notice the regex literal, "^This|show$" .

Now, the regex_search() function typically matches the first pattern option and stops. This case matches “This” at the beginning of the target and stops without continuing to match “show” at the end of the target.

Luckily, the regex_replace() function of the C++ regex library replaces all alternatives anywhere in the target string in its default mode. And so, this regex_replace() function is suited to trim strings. That is, look for the total white-space in front of the string, and look for the total white-space behind the string, and replace both with nothing.

Search and Replace

The following program replaces the first and last words, of the target string, with the word, “Dog”:

    #include <iostream>
    #include <regex>
    #include <string>
    using namespace std;

    int main()
    {
        char str[] = "This is it for the show";
        string newStr = regex_replace(str, regex("^This|show$"), "Dog");
        cout << newStr << endl;  

        return 0;
    }

The output is:

    Dog is it for the Dog

The program uses the regex_replace() function. The first argument is the target string. The second argument is the regex object. The third argument is the replacing string literal. The return string is the modified string object. So the string class had to be included.

Trimming Proper

Consider the string:

    "\t I want democracy! \n"

Two white-space characters, ‘\t’ and ‘ ’, are in front of the useful text. Another two white-space characters, ‘ ’ and ‘\t’, are behind the useful text. Trimming means removing all white-space characters in front of the text and removing all white-space characters behind the text.

To match the first two characters here, the pattern is “\t| “, that is, ‘\t’ or one space. To match the last two characters here, the pattern is ” |\t”, that is, one space or ‘\t’. However, the programmer usually does not know what a particular white-space is consists of. So the best thing to do, is to account for all possible combinations for all white-space characters, with the pattern, ” |\t|\n|\r|\v|\f”. Note the use of the regex OR operator, | .

There is still a problem. The pattern, ” |\t|\n|\r|\v|\f” would match only one white-space character at the start of the string and would match only one white-space character at the end of the string. This is because of the | operators. So, this pattern has to be modified to match all the white-space characters at the start of the string or at the end of the string. So any possible character has to be matched zero or more times of the syntax, x* . And the ultimate pattern to match consecutive whitespace characters is

    "[ |\t|\n|\r|\v|\f]*"

To match consecutive white-space characters at the start of the string, use,

    "^[ |\t|\n|\r|\v|\f]*"

Note the presence and position of ^ .

To match consecutive white-space characters at the end of the string, use,

    "[ |\t|\n|\r|\v|\f]*$"

Note the presence and position of $ . And to match consecutive white-space characters at the start OR at the end of the string, use,

    "^[ |\t|\n|\r|\v|\f]*|[ |\t|\n|\r|\v|\f]*$"

Note the use of | in the middle of the overall pattern.

After matching, all the white-space characters are replaced with nothing, that is, “”, the empty string. Remember that the regex_replace() function replaces all occurrences of sub-strings matched to the pattern all over the target string.

The following program, trims the target string, “\t I want democracy! \n” to “I want democracy!” :

    #include <iostream>
    #include <regex>
    #include <string>
    using namespace std;

    int main()
    {
        char str[] = "\t I want democracy! \n";
        string retStr = regex_replace(str, regex("^[ |\t|\n|\r|\v|\f]*|[ |\t|\n|\r|\v|\f]*$"), "");
        cout << retStr << endl;  
   
        return 0;
    }

The output is:

    I want democracy!

Conclusion

Trimming a string means removing white-spaces in front and behind the string. A white-space consists of white-space characters. White space characters are ‘ ‘, ‘\n’, ‘\r’, ‘f’, ‘\t’ ‘\v’. To trim a string in C++, including the regex library, and use the regex_replace() function to search and replace. Replace any white-space at the start and/or at the end of the string with the empty string.

About the author

Chrysanthus Forcha

Discoverer of mathematics Integration from First Principles and related series. Master’s Degree in Technical Education, specializing in Electronics and Computer Software. BSc Electronics. I also have knowledge and experience at the Master’s level in Computing and Telecommunications. Out of 20,000 writers, I was the 37th best writer at devarticles.com. I have been working in these fields for more than 10 years.