Pure Programmer
Blue Matrix


Cluster Map

Strings

L1

This page is under construction. Please come back later.

At their heart, a string is simply a sequence of characters. Characters, in turn, are encoded as integers using encoding schemes such as [[ASCII]], [[ISO-Latin-1]] or [[Unicode]]. So a string is simply a sequence of integers. Languages like FORTRAN or C that don't support strings as a fundamental type, actually treat strings as a sequence of integers. Modern programming languages support strings as a fundamental data type which make them much easier to use and manipulate.

Strings are particularly useful when conveying information to the user in contrast to being useful for computation like integers or floating point values. While strings are typically implemented as an array/list of the more basic character type, modern languages provide a string data type that encapsulates the characters of a string and provide functionality for working with strings. When using strings in our programs there are two types of strings: [[literal]] strings and string variables.

Variables and Literals

String literals are string values that are directly written in the program itself. String literals only represent the one string as it is typed into the code. String literals are enclosed in double quotes in contrast to character literals which are enclosed in single quotes. String literals can be used any place a string is required.

"Hello, world!"
"My name is: "
"Four score and seven years ago..."
L"I ❤ C++"		// Wide string literal
'z'				// This is not a string literal but a character literal
L'❤'			// This is not a string literal but a wide character literal

Example String Literals

String variables are like any other variable. This means that string variables must be declared before they can be used and must follow the variable naming rules discussed in the variables section. that is they can be set and reset to different values at will. Unlike variables of the primitive types such as boolean, integer and floating point, string variables represent an entire string of characters. String variables can be set using string literals or other string variables.

C++ has two different types of characters and strings. The type that was provided originally in C++'s predecessor C, namely char, supports only 8-bit characters and is suitable only for ASCII characters. The C++ string type is composed of these 8-bit char types. Later wide characters of type wchar_t were added. These are implementation specific but typcially support either UTF-16 or UTF-32 Unicode. The corresponding wide string type in C++ is the wstring type. It is not possible to mix the older 8-bit char/string types with the newer wchar_t/wstring. Also these wide types can only be output to the wide version of the console object, namely wcout.

Concatenation

One of the most basic operations we can perform on strings is called concatenation. Concatenation takes two strings and forms a new one by appending the second string on to the end of the first string. Using concatenation we can build up a complex string from small strings. In C++ the concatenation operator is the + (plus) symbol.

Strings1.cpp
/******************************************************************************
 * This program demonstrates string concatenation.
 * 
 * Copyright © 2020 Richard Lesh.  All rights reserved.
 *****************************************************************************/

#include <iostream>
#include <string>

using namespace std;

int main(int argc, char **argv) {
	string const a = "one";
	string const b = "two";
	string const c = "three";
	string d = a + b + c;
	cout << d << endl;
	d = a + "\t" + b + "\n" + c;
	cout << d << endl;
	return 0;
}

Output
$ g++ -std=c++17 Strings1.cpp -o Strings1 -lfmt $ ./Strings1 onetwothree one two three

Operations on Strings

C++ supports a number of operations on strings. The most often used is determining the length of a string. [an error occurred while processing this directive] Counting the characters in a string is accomplished using the length() method. Some of the other string operations available are summarized in the following table.

String Operations
Fixed PointScientific Notation
00e0
3.14153.1415e0
Strings2.cpp
/******************************************************************************
 * This program demonstrates basic string functions.
 * 
 * Copyright © 2020 Richard Lesh.  All rights reserved.
 *****************************************************************************/

#include <clocale>
#include <codecvt>
#include <iostream>
#include <string>

std::locale utf8loc(std::locale(), new std::codecvt_utf8<wchar_t>);
using namespace std;

int main(int argc, char **argv) {
	setlocale(LC_ALL, "en_US.UTF-8");
	wcout.imbue(utf8loc);
	wcin.imbue(utf8loc);

	wstring const alphabet = L"abcdefghijklmnopqrstuvwxyzabc";
	wstring const greekAlphabet = L"αβγδεζηθικλμνξοπρσςτυφχψωαβγ";
	wstring const emoji = L"😃😇🥰🤪🤑😴🤒🥵🥶🤯🥳😎😥😱😡🤬💀👽🤖😺🙈🙉🙊😃😇🥰";

	wcout << L"Length: " << alphabet.length() << endl;
	wcout << L"charAt(17): " << alphabet[17] << endl;
	wcout << L"codePointAt(17): " << (int)(alphabet[17]) << endl;
	wcout << L"substr(23, 26): " << alphabet.substr(23, 3) << endl;
	wcout << L"prefix(6): " << alphabet.substr(0, 6) << endl;
	wcout << L"right_tail(6): " << alphabet.substr(6) << endl;
	wcout << L"suffix(6): " << alphabet.substr(alphabet.length() - 6) << endl;
	wcout << L"find(\'def\'): " << alphabet.find(L"def") << endl;
	wcout << L"find(\'def\') is not found: " << (alphabet.find(L"def") == string::npos) << endl;
	wcout << L"find(\'bug\'): " << alphabet.find(L"bug") << endl;
	wcout << L"find(\'bug\') is not found: " << (alphabet.find(L"bug") == string::npos) << endl;
	wcout << L"rfind(\'abc\'): " << alphabet.rfind(L"abc") << endl;
	wcout << L"rfind(\'abc\') is not found: " << (alphabet.rfind(L"abc") == string::npos) << endl;
	wcout << L"rfind(\'bug\'): " << alphabet.rfind(L"bug") << endl;
	wcout << L"rfind(\'bug\') is not found: " << (alphabet.rfind(L"bug") == string::npos) << endl;

	wcout << L"Length: " << greekAlphabet.length() << endl;
	wcout << L"charAt(17): " << greekAlphabet[17] << endl;
	wcout << L"codePointAt(17): " << (int)(greekAlphabet[17]) << endl;
	wcout << L"substr(23, 26): " << greekAlphabet.substr(23, 3) << endl;
	wcout << L"prefix(6): " << greekAlphabet.substr(0, 6) << endl;
	wcout << L"right_tail(6): " << greekAlphabet.substr(6) << endl;
	wcout << L"suffix(6): " << greekAlphabet.substr(greekAlphabet.length() - 6) << endl;
	wcout << L"find(\'δεζ\'): " << greekAlphabet.find(L"δεζ") << endl;
	wcout << L"find(\'δεζ\') is not found: " << (greekAlphabet.find(L"δεζ") == string::npos) << endl;
	wcout << L"find(\'bug\'): " << greekAlphabet.find(L"bug") << endl;
	wcout << L"find(\'bug\') is not found: " << (greekAlphabet.find(L"bug") == string::npos) << endl;
	wcout << L"rfind(\'αβγ\'): " << greekAlphabet.rfind(L"αβγ") << endl;
	wcout << L"rfind(\'αβγ\') is not found: " << (greekAlphabet.rfind(L"αβγ") == string::npos) << endl;
	wcout << L"rfind(\'bug\'): " << greekAlphabet.rfind(L"bug") << endl;
	wcout << L"rfind(\'bug\') is not found: " << (greekAlphabet.rfind(L"bug") == string::npos) << endl;

	wcout << L"Length: " << emoji.length() << endl;
	wcout << L"charAt(16): " << emoji[16] << endl;
	wcout << L"codePointAt(16): " << (int)(emoji[16]) << endl;
	wcout << L"substr(20, 24): " << emoji.substr(20, 4) << endl;
	wcout << L"prefix(6): " << emoji.substr(0, 6) << endl;
	wcout << L"right_tail(6): " << emoji.substr(6) << endl;
	wcout << L"suffix(6): " << emoji.substr(emoji.length() - 6) << endl;
	wcout << L"find(\'😱😡🤬\'): " << emoji.find(L"😱😡🤬") << endl;
	wcout << L"find(\'😱😡🤬\') is not found: " << (emoji.find(L"😱😡🤬") == string::npos) << endl;
	wcout << L"find(\'bug\'): " << emoji.find(L"bug") << endl;
	wcout << L"find(\'bug\') is not found: " << (emoji.find(L"bug") == string::npos) << endl;
	wcout << L"rfind(\'😃😇🥰\'): " << emoji.rfind(L"😃😇🥰") << endl;
	wcout << L"rfind(\'😃😇🥰\') is not found: " << (emoji.rfind(L"😃😇🥰") == string::npos) << endl;
	wcout << L"rfind(\'bug\'): " << emoji.rfind(L"bug") << endl;
	wcout << L"rfind(\'bug\') is not found: " << (emoji.rfind(L"bug") == string::npos) << endl;
	return 0;
}

Output
$ g++ -std=c++17 Strings2.cpp -o Strings2 -lfmt $ ./Strings2 Length: 29 charAt(17): r codePointAt(17): 114 substr(23, 26): xyz prefix(6): abcdef right_tail(6): ghijklmnopqrstuvwxyzabc suffix(6): xyzabc find('def'): 3 find('def') is not found: 0 find('bug'): 18446744073709551615 find('bug') is not found: 1 rfind('abc'): 26 rfind('abc') is not found: 0 rfind('bug'): 18446744073709551615 rfind('bug') is not found: 1 Length: 28 charAt(17): σ codePointAt(17): 963 substr(23, 26): ψωα prefix(6): αβγδεζ right_tail(6): ηθικλμνξοπρσςτυφχψωαβγ suffix(6): χψωαβγ find('δεζ'): 3 find('δεζ') is not found: 0 find('bug'): 18446744073709551615 find('bug') is not found: 1 rfind('αβγ'): 25 rfind('αβγ') is not found: 0 rfind('bug'): 18446744073709551615 rfind('bug') is not found: 1 Length: 26 charAt(16): 💀 codePointAt(16): 128128 substr(20, 24): 🙈🙉🙊😃 prefix(6): 😃😇🥰🤪🤑😴 right_tail(6): 🤒🥵🥶🤯🥳😎😥😱😡🤬💀👽🤖😺🙈🙉🙊😃😇🥰 suffix(6): 🙈🙉🙊😃😇🥰 find('😱😡🤬'): 13 find('😱😡🤬') is not found: 0 find('bug'): 18446744073709551615 find('bug') is not found: 1 rfind('😃😇🥰'): 23 rfind('😃😇🥰') is not found: 0 rfind('bug'): 18446744073709551615 rfind('bug') is not found: 1

Comparing Strings

Like numeric types, string types are comparable. This means that we can order strings and determine which ones are lower and which ones are higher in that ordering. Strings are ordered using [[Lexicographical Order]] also know as Alphabetic or Dictionary Order. This is done by comparing the first characters of the two strings to determine which string is lower and which is higher. If the first characters are the same, we move on to the second character in each string and so on until a difference is found. Shorter strings always compare lower than a longer string that has the lower string as its prefix. Like numeric types, we have six comparison operators that allow us to determine how two strings compare lexicographically. Additionally we have the compare() method that returns a negative value if the first string is less than the second, a positive value if the first string is greater than the second, and 0 if the two strings are equal.

String Comparison Operators
OperatorMeaning
==equal to
!=not equal to
<less than
<=less than or equal
>greater than
>=greater than or equal
Strings3.cpp
/******************************************************************************
 * This program demonstrates string comparisons.
 * 
 * Copyright © 2020 Richard Lesh.  All rights reserved.
 *****************************************************************************/

#include <clocale>
#include <codecvt>
#include <iostream>
#include <string>

std::locale utf8loc(std::locale(), new std::codecvt_utf8<wchar_t>);
using namespace std;

int main(int argc, char **argv) {
	setlocale(LC_ALL, "en_US.UTF-8");
	wcout.imbue(utf8loc);
	wcin.imbue(utf8loc);

	wstring const color1 = L"Blue";
	wstring const color2 = L"Red";
	bool result;

	wcout << L"compare(color1, color2): " << color1.compare(color2) << endl;
	result = color1 < color2;
	wcout << L"color1 < color2: " << result << endl;
	result = color1 > color2;
	wcout << L"color1 > color2: " << result << endl;
	result = color1 == color2;
	wcout << L"color1 == color2: " << result << endl;
	result = color1 != color2;
	wcout << L"color1 != color2: " << result << endl;
	return 0;
}

Output
$ g++ -std=c++17 Strings3.cpp -o Strings3 -lfmt $ ./Strings3 compare(color1, color2): -1 color1 < color2: 1 color1 > color2: 0 color1 == color2: 0 color1 != color2: 1

Character Types

Often it is necessary to determine what class an individual character belongs. Character classes such as alphabetic, numeric, whitespace, control and punctuation are common classes of characters in which we would be interested. The following table illustrates how we would determine the class of a character.

Strings4.cpp
/******************************************************************************
 * This program illustrates some of the string functions in Utils
 * 
 * Copyright © 2020 Richard Lesh.  All rights reserved.
 *****************************************************************************/

#include "Utils.hpp"
#include <iostream>
#include <string>

using namespace std;

static string const ltrstr = "   Spaces to the left";
static string const rtrstr = "Spaces to the right  ";
static string const trimstr = "   Spaces at each end   ";
static string const blank = "  \t\n  ";
static string const lowerstr = "This String is Lowercase";
static string const upperstr = "This String is Uppercase";

int main(int argc, char **argv) {
	cout << '|' << Utils::ltrim(ltrstr) << '|' << endl;
	cout << '|' << Utils::rtrim(rtrstr) << '|' << endl;
	cout << '|' << Utils::trim(trimstr) << '|' << endl;
	cout << '|' << Utils::trim(blank) << '|' << endl;
	cout << '|' << Utils::tolower(lowerstr) << '|' << endl;
	cout << '|' << Utils::toupper(upperstr) << '|' << endl;
	return 0;
}


Output
$ g++ -std=c++17 Strings4.cpp -o Strings4 -lfmt $ ./Strings4 |Spaces to the left| |Spaces to the right| |Spaces at each end| || |this string is lowercase| |THIS STRING IS UPPERCASE|
Character Classification
FunctionDescription
isalnum(c)Check if character is alphanumeric.
isalpha(c)Check if character is alphabetic.
isblank(c)Check if character is blank.
iscntrl(c)Check if character is a control character.
isdigit(c)Check if character is decimal digit.
isgraph(c)Check if character has graphical representation.
islower(c)Check if character is lowercase letter.
isprint(c)Check if character is printable.
ispunct(c)Check if character is a punctuation character.
isspace(c)Check if character is a white-space.
isupper(c)Check if character is uppercase letter.
isxdigit(c)Check if character is hexadecimal digit.

Character Conversion
FunctionDescription
tolower(c)Returns lowercase version of the character.
toupper(c)Returns uppercase version of the character.
explain stringstream also to_string() and to_wstring() to convert numerics

Questions

Projects

More ★'s indicate higher difficulty level.

References