A string can be created in two main ways: by const char* (array-of-chars) or instantiating from the string class. In the case of instantiation from the string class, the string library has to be included in the C++ program. Identifying, returning, deleting, and replacing a substring in C++, is normally done only with the string object instantiated from the string class.
The string object is a data structure with methods (member functions). Its list consists of elements, where each element has a character. The list values are the characters. Like an array, each character of the string object can be accessed by an index. So, a sub-string can be identified by indexes: a lower index and a higher index. The range begins from the lower index to the higher index, excluding the higher index. The character for the higher index is not included in the range, and the length of the sub-string is from the character of the lower index to the character just before that of the higher index.
Two iterators can also identify a substring or range: the first iterator is for the start of the range, and the last iterator, is for the character, which is just after the actual last character (or at the end-of-string). There is a simple relationship between iterator and index – see below.
This article explains what a substring is and how to identify, return, delete and replace a substring in C++.
Article Content
- Identifying and Returning a Substring
- Relating Iterator and Index
- Deleting a Substring
- Replacing a Substring
- Conclusion
Identifying and Returning a Substring
The C++ class has a member function called, substr() for sub-string(). The syntax is:
This function returns the substring as a string object. The first argument indicates the index position where the substring begins. The character of pos is included in the substring. The second argument gives the length of the substring. The length is the number of characters beginning from pos. It does not include the character for the higher index. The higher index is: pos + npos (though the length, npos is measured shifted one place to the left). Index counting begins from zero. The following program illustrates the use of this member function:
#include <string>
using namespace std;
int main(){
string str = "one_two_three_four_five";
string substrin = str.substr(8, 5);
cout << substrin << endl;
return 0;
}
The output is:
If these two arguments are absent, the whole string is considered, as illustrated in the following program:
#include <string>
using namespace std;
int main() {
string str = "one_two_three_four_five";
string substrin = str.substr();
cout << substrin << endl;
return 0;
}
The output is:
The reserved word, const at the end of the syntax, means that the substr() function copies the sub-string and returns it. It does not delete the sub-string.
Relating Iterator and Index
When an iterator is pointing to a character, in order to get the iterator for the end of the range, just add the length (number) of characters for the interval, and the new iterator will point to the end of the range. The character for this last iterator is not included in the range or substring. Range and substring here are the same things (they are the same thing above). For the substr() string member function, npos is the length of the interval.
The iterator that corresponds to index zero is:
npos can be added to this iterator to point to the last element of the range. The last element or last character of the range is not part of the substring.
The iterator that corresponds to the point just after the last character of the string is:
npos can be subtracted from this in order to point to any desired first character of the string.
begin() and end() are member functions of the string class.
Deleting a Substring
A substring is identified in a string object, with the arguments, pos, and npos of the substr() function. Recall that npos is an interval. The string class also has a member function called erase(). erase() is in overloaded forms. One of the overloaded erase() member functions identifies the substring with pos and npos. The syntax is:
This erase function deletes the substring and returns the original string with the substring deleted.
So, to delete a substring, the substr() function is not needed. It is its arguments that are needed. To delete a substring, use the erase member function of the string object. To have a copy of the substring, simply use the substr() function before erasing. The following program shows a good way of deleting a substring:
#include <string>
using namespace std;
int main() {
string str = "one_two_three_four_five";
string substrin = str.substr(8, 5);
string ret = str.erase(8, 5);
cout <<substrin <<endl;
cout <<str <<endl;
cout <<ret <<endl;
return 0;
}
The output is:
one_two__four_five
one_two__four_five
A syntax to delete a substring with iterator arguments is:
With this, the beginning of the substring is first identified by the iterator, corresponding to the index, pos. To obtain the end of the substring, the iterator is identified by last, which is obtained by doing, first + npos. The coding to delete a substring using this overloaded erase() function variant is left as an exercise to the reader.
Replacing a Substring
What really identifies a substring are the arguments: pos and npos. To return a substring, use the string class member function, substr(). To delete a substring, use the string class member function, erase(). And to replace a substring with one of any length, use the string class member function, replace(). The replace function has many overloaded variants. The one that uses index is:
where pos1 is pos, n1 is npos, and t is an independent array-of-chars for replacement. It returns the original string, including the replacement.
Note: in C++, a substring should not be deleted (erased) before it is replaced.
The following program shows a good way of replacing a substring:
#include <string>
using namespace std;
int main() {
string str = "one_two_three_four_five";
char chs[] = "ccc";
string substrin = str.substr(8, 5);
string ret = str.replace(8, 5, chs);
cout <<substrin <<endl;
cout <<str <<endl;
cout <<ret <<endl;
return 0;
}
The output is:
one_two_ccc_four_five
one_two_ccc_four_five
The replacement for the above code was less than 5 characters in length. The following program shows the case where the replacement is greater than 5 characters:
#include <string>
using namespace std;
int main() {
string str = "one_two_three_four_five";
char chs[] = "cccccccc";
string substrin = str.substr(8, 5);
string ret = str.replace(8, 5, chs);
cout <<substrin <<endl;
cout <<str <<endl;
cout <<ret <<endl;
return 0;
}
one_two_cccccccc_four_five
one_two_cccccccc_four_five
A syntax to replace a substring with iterator arguments is:
With this syntax, the beginning of the substring is identified by the iterator, i1, which corresponds to the index, pos. To obtain the end of the substring, the iterator is identified by i2, which is obtained by doing, i1 + npos. t has the same meaning as above. The following program shows how to use this syntax:
#include <string>
using namespace std;
int main() {
string str = "one_two_three_four_five";
string::const_iterator itB = str.begin();
string::const_iterator itPos = itB + 8;
string::const_iterator itNpos = itPos + 5;
char chs[] = "ccccc";
string substrin = str.substr(8, 5);
string ret = str.replace(itPos, itNpos, chs);
cout <<substrin <<endl;
cout <<str <<endl;
cout <<ret <<endl;
return 0;
}
The output is:
one_two_ccccc_four_five
one_two_ccccc_four_five
Note that the iterators used are constant iterators. The iterator that corresponds to the index, pos, is obtained with itB + 8. The iterator that corresponds to the higher index is obtained with itPos + 5.
Conclusion
A sub-string or substring or range is just a portion of a sequence of characters within a string literal. To return a substring, use the string class member function, substr(). To delete a substring, use the string class member function, erase(). To replace a substring, use the string class member function, replace(). For all these functions, the index argument, pos, and the index interval, npos, are key to identifying the principal string’s substring.