9.6. strings RevisitedWe introduced the string type in Section 3.2 (p. 80). Table 9.12 (p. 337) recaps the string operations covered in that section.
In addition to the operations we've already used, strings also supports most of the sequential container operations. In some ways, we can think of a string as a container of characters. With some exceptions, strings support the same operations that vectors support: The exceptions are that string does not support the operations to use the container like a stack: We cannot use the front, back, and pop_back operations on strings. The container operations that string supports are:
When we say that string supports the container operations, we mean that we could take a program that manipulates a vector and rewrite that same program to operate on strings. For example, we could use iterators to print the characters of a string a line at a time to the standard output:
string s("Hiya!");
string::iterator iter = s.begin();
while (iter != s.end())
cout << *iter++ << endl; // postfix increment: print old value
Not surprisingly, this code looks almost identical to the code from page 163 that printed the elements of a vector<int>. In addition to the operations that string shares with the containers, string supports other operations that are specific to strings. We will review these string-specific operations in the remainder of this section. These operations include additional versions of container-related operations as well as other, completely new functions. The additional functions that string provides are covered starting on page 341. The additional versions of the container operations that string provides are defined to support attributes that are unique to strings and not shared by the containers. For example, several operations permit us to specify arguments that are pointers to character arrays. These operations support the close interaction between library strings and character arrays, whether null-terminated or not. Other versions let us use indices rather than iterators. These versions operate positionally: We specify a starting position, and in some cases a count, to specify the element or range of elements which we want to manipulate.
Readers might want to skim the remainder of Section 9.6. Once you know what kinds of operations are available, you can return for the details when writing programs that need to use a given operation. 9.6.1. Other Ways to Construct stringsThe string class supports all but one of the constructors in Table 9.2 (p. 307). The constructor that takes a single size parameter is not supported for string. We can create a string: as the empty string, by providing no argument; as a copy of another string; from a pair of iterators; or from a count and a character: string s1; // s1 is the empty string string s2(5, 'a'); // s2 == "aaaaa" string s3(s2); // s3 is a copy of s2 string s4(s3.begin(), s3.begin() + s3.size() / 2); // s4 == "aa" In addition to these constructors, the string type supports three other ways to create a string. We have already used the constructor that takes a pointer to the first character in a null-terminated, character array. There is another constructor that takes a pointer to an element in a character array and a count of how many characters to copy. Because the constructor takes a count, the array does not have to be null-terminated: char *cp = "Hiya"; // null-terminated array char c_array[] = "World!!!!"; // null-terminated char no_null[] = {'H', 'i'}; // not null-terminated string s1(cp); // s1 == "Hiya" string s2(c_array, 5); // s2 == "World" string s3(c_array + 5, 4); // s3 == "!!!!" string s4(no_null); // runtime error: no_null not null-terminated string s5(no_null, 2); // ok: s5 == "Hi" We define s1 using the constructor that takes a pointer to the first character of a null-terminated array. All the characters in that array, up to but not including the terminating null, are copied into the newly created string. The initializer for s2 uses the second constructor, taking a pointer and a count. In this case, we start at the character denoted by the pointer and copy as many characters as indicated in the second argument. s2, therefore, is a copy of the first five characters from the array c_array. Remember that when we pass an array as an argument, it is automatically converted to a pointer to its first element. Of course, we are not restricted to passing a pointer to the beginning of the array. We initialize s3 to hold four exclamation points by passing a pointer to the first exclamation point in c_array. The initializers for s4 and s5 are not C-style strings. The definition of s4 is an error. This form of initialization may be called only with a null-terminated array. Passing an array that does not contain a null is a serious error (Section 4.3, p. 130), although it is an error that the compiler cannot detect. What happens at run time is undefined. The initialization of s5 is fine: That initializer includes a count that says how many characters to copy. As long as the count is within the size of the array, it doesn't matter whether the array is null-terminated.
Using a Substring as the InitializerThe other pair of constructors allow us to create a string as a copy of a substring of the characters in another string: string s6(s1, 2); // s6 == "ya" string s7(s1, 0, 2); // s7 == "Hi" string s8(s1, 0, 8); // s8 == "Hiya" The first two arguments are the string from which we want to copy and a starting position. In the two-argument version, the newly created string is initialized with the characters from that position to the end of the string argument. We can also provide a third argument that specifies how many characters to copy. In this case, we copy as many characters as indicated (up to the size of the string), starting at the specified position. For example, when we create s7, we copy two characters from s1, starting at position zero. When we create s8, we copy only four characters, not the requested nine. Regardless of how many characters we ask to copy, the library copies up to the size of the string, but not more. 9.6.2. Other Ways to Change a stringMany of the container operations that string supports operate in terms of iterators. For example, erase takes an iterator or iterator range to specify which element(s) to remove from the container. Similarly, the first argument to each version of insert takes an iterator to indicate the position before which to insert the values represented by the other arguments. Although string supports these iterator-based operations, it also supplies operations that work in terms of an index. The index is used to indicate the starting element to erase or the position before which to insert the appropriate values. Table 9.14 lists the operations that are common to both string and the containers; Table 9.15 on the facing page lists the string-only operations.
Position-Based ArgumentsThe string-specific versions of these operations take arguments similar to those of the additional constructors covered in the previous section. These operations let us deal with strings positionally and/or let us use arguments that are pointers to character arrays rather than strings. For example, all containers let us specify a pair of iterators that denote a range of elements to erase. For strings, we can also specify the range by passing a starting position and count of the number of elements to erase. Assuming s is at least five characters long, we could erase the last five characters as follows: s.erase(s.size() - 5, 5); // erase last five characters from s Similarly, we can insert a given number of values in a container before the element referred to by an iterator. In the case of strings, we can specify the insertion point as an index rather than using an iterator: s.insert(s.size(), 5, '!'); // insert five exclamation points at end of s Specifying the New ContentsThe characters to insert or assign into the string can be taken from a character array or another string. For example, we can use a null-terminated character array as the value to insert or assign into a string: char *cp = "Stately plump Buck"; string s; s.assign(cp, 7); // s == "Stately" s.insert(s.size(), cp + 7); // s == "Stately plump Buck" Similarly, we can insert a copy of one string into another as follows: s = "some string"; s2 = "some other string"; // 3 equivalent ways to insert all the characters from s2 at beginning of s // insert iterator range before s.begin() s.insert(s.begin(), s2.begin(), s2.end()); // insert copy of s2 before position 0 in s s.insert(0, s2); // insert s2.size() characters from s2 starting at s2[0] before s[0] s.insert(0, s2, 0, s2.size()); 9.6.3. string-Only OperationsThe string type provides several other operations that the containers do not:
The substr OperationThe substr operation lets us retrieve a substring from a given string. We can pass substr a starting position and a count. It creates a new string that has that many characters, (up to the end of the string) from the target string, starting at the given position: string s("hello world"); // return substring of 5 characters starting at position 6 string s2 = s.substr(6, 5); // s2 = world Alternatively, we could obtain the same result by writing: // return substring from position 6 to the end of s string s3 = s.substr(6); // s3 = world
The append and replace FunctionsThere are six overloaded versions of append and ten versions of replace. The append and replace functions are overloaded using the same set of arguments, which are listed in Table 9.18 on the next page. These arguments specify the characters to add to the string. In the case of append, the characters are added at the end of the string. In the replace function, these characters are inserted in place a specified range of existing characters in the string. The append operation is a shorthand way of inserting at the end: string s("C++ Primer"); // initialize s to "C++ Primer" s.append(" 3rd Ed."); // s == "C++ Primer 3rd Ed." // equivalent to s.append(" 3rd Ed.") s.insert(s.size(), " 3rd Ed."); The replace operations remove an indicated range of characters and insert a new set of characters in their place. The replace operations have the same effect as calling erase and insert. The ten different versions of replace differ from each other in how we specify the characters to remove and in how we specify the characters to insert in their place. The first two arguments specify the range of elements to remove. We can specify the range either with an iterator pair or an index and a count. The remaining arguments specify what new characters to insert. We can think of replace as a shorthand way of erasing some characters and inserting others in their place:
// starting at position 11, erase 3 characters and then insert "4th" s.replace(11, 3, "4th"); // s == "C++ Primer 4th Ed." // equivalent way to replace "3rd" by "4th" s.erase(11, 3); // s == "C++ Primer Ed." s.insert(11, "4th"); // s == "C++ Primer 4th Ed."
In the previous call to replace, the text we inserted happens to be the same size as the text we removed. We could insert a larger or smaller string:
s.replace(11, 3, "Fourth"); // s == "C++ Primer Fourth Ed."
In this call we remove three characters but insert six in their place.
9.6.4. string Search OperationsThe string class provides six search functions, each named as a variant of find. The operations all return a string::size_type value that is the index of where the match occurred, or a special value named string::npos if there is no match. The string class defines npos as a value that is guaranteed to be greater than any valid index. There are four versions of each of the search operations, each of which takes a different set of arguments. The arguments to the search operations are listed in Table 9.20. Basically, these operations differ as to whether they are looking for a single character, another string, a C-style, null-terminated string, or a given number of characters from a character array.
Finding an Exact MatchThe simplest of the search operations is the find function. It looks for its argument and returns the index of the first match that is found, or npos if there is no match:
string name("AnnaBelle");
string::size_type pos1 = name.find("Anna"); // pos1 == 0
Returns 0, the index at which the substring "Anna" is found in "AnnaBelle".
When we look for a value in the string, case matters:
string lowercase("annabelle");
pos1 = lowercase.find("Anna"); // pos1 == npos
This code will set pos2 to nposthe string Anna does not match anna.
Find Any CharacterA slightly more complicated problem would be if we wanted to match any character in our search string. For example, the following locates the first digit within name: string numerics("0123456789"); string name("r2d2"); string::size_type pos = name.find_first_of(numerics); cout << "found number at index: " << pos << " element is " << name[pos] << endl; In this example, pos is set to a value of 1 (the elements of a string, remember, are indexed beginning at 0). Specifying Where to Start the SearchWe can pass an optional starting position to the find operations. This optional argument indicates the index position from which to start the search. By default, that position is set to zero. One common programming pattern uses this optional argument to loop through a string finding all occurrences. We could rewrite our search of "r2d2" to find all the numbers in name: string::size_type pos = 0; // each trip reset pos to the next instance in name while ((pos = name.find_first_of(numerics, pos)) != string::npos) { cout << "found number at index: " << pos << " element is " << name[pos] << endl; ++pos; // move to the next character } In this case, we initialize pos to zero so that on the first trip through the while name is searched, beginning at position 0. The condition in the while resets pos to the index of the first number encountered, starting from the current value of pos. As long as the return from find_first_of is a valid index, we print our result and increment pos. Had we neglected to increment pos at the end of this loop, then it would never terminate. To see why, consider what would happen if we didn't. On the second trip through the loop. we start looking at the character indexed by pos. That character would be a number, so find_first_of would (repeatedly) returns pos!
Looking for a NonmatchInstead of looking for a match, we might call find_first_not_of to find the first position that is not in the search argument. For example, to find the first non-numeric character of a string, we can write string numbers("0123456789"); string dept("03714p3"); // returns 5, which is the index to the character 'p' string::size_type pos = dept.find_first_not_of(numbers); Searching BackwardEach of the find operations that we've seen so far executes left to right. The library provides an analogous set of operations that look through the string from right to left. The rfind member searches for the lastthat is, rightmostoccurrence of the indicated substring: string river("Mississippi"); string::size_type first_pos = river.find("is"); // returns 1 string::size_type last_pos = river.rfind("is"); // returns 4 find returns an index of 1, indicating the start of the first "is", while rfind returns an index of 4, indicating the start of the last occurrence of "is". The find_last FunctionsThe find_last functions operate like the corresponding find_first functions, except that they return the last match rather than the first:
Each of these operations takes an optional second argument indicating the position within the string to begin searching. 9.6.5. Comparing stringsAs we saw in Section 3.2.3 (p. 85), the string type defines all the relational operators so that we can compare two strings for equality (==), inequality (!=), and the less- or greater-than operations (<, <=, >, >=). Comparison between strings is lexicographicalthat is, string comparison is the same as a case-sensitive, dictionary ordering: string cobol_program_crash("abend"); string cplus_program_crash("abort");
Here cobol_program_crash is less than the cplus_program_crash. The relational operators compare two strings character by character until reaching a position where the two strings differ. The overall comparison of the strings depends on the comparison between these unequal characters. In this case, the first unequal characters are 'e' and 'o'. The letter 'e' occurs before (is less than) 'o' in the English alphabet and so "abend" is less than "abort". If the strings are of different length, and one string is a substring of the other, then the shorter string is less than the longer. The compare FunctionsIn addition to the relational operators, string provides a set of compare operations that perform lexicographical comparions. The results of these operations are similar to the C library strcmp function (Section 4.3, p. 132). Given
s1.compare (args);
compare returns one of three possible values:
For example // returns a negative value cobol_program_crash.compare(cplus_program_crash); // returns a positive value cplus_program_crash.compare(cobol_program_crash);
The overloaded set of six compare operations allows us to compare a substring of either one or both strings for comparison. They also let us compare a string to a character array or portion thereof: char second_ed[] = "C++ Primer, 2nd Edition"; string third_ed("C++ Primer, 3rd Edition"); string fourth_ed("C++ Primer, 4th Edition"); // compares C++ library string to C-style string fourth_ed.compare(second_ed); // ok, second_ed is null-terminated // compare substrings of fourth_ed and third_ed fourth_ed.compare(fourth_ed.find("4th"), 3, third_ed, third_ed.find("3rd"), 3); The second call to compare is the most interesting. This call uses the version of compare that takes five arguments. We use find to locate the position of the beginning of the substring "4th". We compare three characters starting at that position to a substring from third_ed. That substring begins at the position returned from find when looking for "3rd" and again we compare three characters. Essentially, this call compares "4th" to "3rd". ![]() |