Output Formatting

We have seen the basic ability to convert values to strings and output them to the console ([[stdout]]). Sometimes, however, we would like more control on the formatting used to output values. Fortunately we have formatted output functions that help us with that goal.

C-Style printf()

In the olden days when the C language was developed in the 1970's it contained a novel new way of formatting output called the printf() function. This function allows one to specify a format string followed by values that will be substituted into the format string based on the type specifiers found in the format string. This approach has proven very popular and has been copied in many languages since.

With formatted output we must provide a format string that can contain literal text interspersed with type specifiers. The type specifiers are placeholders that identify where and how a value is to be formatted and inserted into the string on output. Type specifiers start with a percent sign (%), followed by an optional flag, followed by field width and/or precision, followed by one or two characters identifying the type of the value (called the type specifier). Only the percent sign and the type specifier is required, the other components are optional.

%[flag][width][.precision][type specifier]

In addition to the format string, we also provide the values to be formatted to the function. We can have multiple type specifiers in the format string as long as we provide the matching number of additional arguments. The first argument is formatted by the first type specifier, the second argument by the second type specifier, and so on. Perl has a native implementation of printf() that we can use in our programs.

printf() Illustration — printf() specifiers must match arguments in order

Any other characters in the format string that are not part of a type specifier are printed verbatim. For example:

printf("Page count is %d\n", numPages);
printf("PI: %.2f\n", pi);
printf("Title and Page Count: %s %d\n", title, numPages);
printf("%d of %d\n", currentPage, numPages);

Some of the type specifiers available are listed in the table below.

Frequently Used Type Specifiers
Specifier	Value Type	Description
d	signed integer	Signed decimal
u	unsigned integer	Unsigned decimal
x	unsigned integer	Hexadecimal (lowercase)
X	unsigned integer	Hexadecimal (uppercase)
f	floating point	Decimal floating point
e	floating point	Scientific notation (lowercase)
E	floating point	Scientific notation (uppercase)
g	floating point	Shortest representation: %e or %f
G	floating point	Shortest representation: %E or %F
c	integer	Single character
s	string	String of characters
p	any	Pointer address in hexadecimal
%	none	Outputs literal %

Print Flags
Flag	Description
-	Left-justify within the given field width; Right justification is the default.
+	Result is preceeded with a plus or minus sign (+ or -) even for positive numbers. By default, only negative numbers are preceded with a - sign.
#	Used with o, x or X specifiers the value is preceeded with 0, 0x or 0X respectively. Used with e, E, f, g or G it forces the written output to contain a decimal point even if no more digits follow. By default, if no digits follow, no decimal point is written.
0	Left-pads the number with zeroes (0) instead of spaces

The field width and precision are simply integer values separated by a period (.). One may use either in the type specifier, both or none. Field width sets the number of characters that will be used to output the value padded with spaces if necessary to achieve the field width. The precision specifier is only useful for formatting floating point and string values. For floating point values it limits the number of decimal places after the decimal point. For strings it truncates the string to the specified number of characters.

OutputFormatting1.pl

#!/usr/bin/env perl
use utf8;
use strict;
use warnings;

MAIN:
{
	my $a = 326;
	my $b = -1;
	my $c = 2015;
	my $i1 = 65000;
	my $i2 = -2;
	my $i3 = 3261963;
	my $f1 = 3.1415926;
	my $f2 = 2.99792458e9;
	my $f3 = 1.234e-4;
	my $c1 = ord("A");
	my $c2 = ord("B");
	my $c3 = ord("C");
	my $s1 = "Apples";
	my $s2 = "and";
	my $s3 = "Bananas";
	my $b1 = 1;
	my $b2 = 0;

	printf("Decimals: \%d \%d \%d\n", $a, $b, $c);
	printf("Unsigned Decimals: \%u \%u \%u\n", $a, $b, $c);
	printf("Hexadecimals: \%#x \%#x \%#x\n", $a, $b, $c);
	printf("Long Decimals: \%d \%d \%d\n", $i1, $i2, $i3);
	printf("Long Hexadecimals: \%016x \%016x \%016x\n", $i1, $i2, $i3);

	printf("Fixed FP: \%f \%f \%f\n", $f1, $f2, $f3);
	printf("Exponential FP: \%e \%e \%e\n", $f1, $f2, $f3);
	printf("General FP: \%g \%g \%g\n", $f1, $f2, $f3);
	printf("General FP with precision: \%.2g \%.2g \%.2g\n", $f1, $f2, $f3);

	printf("Boolean: \%b \%b\n", $b1, $b2);
	printf("Character: \%c \%c \%c\n", $c1, $c2, $c3);
	printf("String: \%s \%s \%s\n", $s1, $s2, $s3);
}

Output

$ perl OutputFormatting1.pl Decimals: 326 -1 2015 Unsigned Decimals: 326 18446744073709551615 2015 Hexadecimals: 0x146 0xffffffffffffffff 0x7df Long Decimals: 65000 -2 3261963 Long Hexadecimals: 000000000000fde8 fffffffffffffffe 000000000031c60b Fixed FP: 3.141593 2997924580.000000 0.000123 Exponential FP: 3.141593e+00 2.997925e+09 1.234000e-04 General FP: 3.14159 2.99792e+09 0.0001234 General FP with precision: 3.1 3e+09 0.00012 Boolean: 1 0 Character: A B C String: Apples and Bananas

C-Style sprintf()

Along with the printf() function, the C language introduced a similar function called sprintf(). Instead of printing a formatted string to the console ([[stdout]]), it formats and then returns a string. This more general form has many uses not only for console output but for [[GUI]] output and file output.

OutputFormatting2.pl

#!/usr/bin/env perl
use utf8;
use Utils;
use strict;
use warnings;

MAIN:
{
	my $a = 326;
	my $b = -1;
	my $c = 2015;
	my $i1 = 65000;
	my $i2 = -2;
	my $i3 = 3261963;
	my $f1 = 3.1415926;
	my $f2 = 2.99792458e9;
	my $f3 = 1.234e-4;
	my $c1 = ord("A");
	my $c2 = ord("B");
	my $c3 = ord("C");
	my $s1 = "Apples";
	my $s2 = "and";
	my $s3 = "Bananas";
	my $b1 = 1;
	my $b2 = 0;

	my $s;

	$s = sprintf("Decimals: \%d \%d \%d", $a, $b, $c);
	print $s, "\n";
	$s = sprintf("Unsigned Decimals: \%u \%u \%u", $a, $b, $c);
	print $s, "\n";
	$s = sprintf("Hexadecimals: \%#x \%#x \%#x", $a, $b, $c);
	print $s, "\n";
	$s = sprintf("Long Decimals: \%d \%d \%d", $i1, $i2, $i3);
	print $s, "\n";
	$s = sprintf("Long Hexadecimals: \%016x \%016x \%016x", $i1, $i2, $i3);
	print $s, "\n";

	$s = sprintf("Fixed FP: \%f \%f \%f", $f1, $f2, $f3);
	print $s, "\n";
	$s = sprintf("Exponential FP: \%e \%e \%e", $f1, $f2, $f3);
	print $s, "\n";
	$s = sprintf("General FP: \%g \%g \%g", $f1, $f2, $f3);
	print $s, "\n";
	$s = sprintf("General FP with precision: \%.2g \%.2g \%.2g", $f1, $f2, $f3);
	print $s, "\n";

	$s = sprintf("Boolean: \%b \%b", $b1, $b2);
	print $s, "\n";
	$s = sprintf("Character: \%c \%c \%c", $c1, $c2, $c3);
	print $s, "\n";
	$s = sprintf("String: \%s \%s \%s", $s1, $s2, $s3);
	print $s, "\n";
}

Output

$ perl OutputFormatting2.pl Decimals: 326 -1 2015 Unsigned Decimals: 326 18446744073709551615 2015 Hexadecimals: 0x146 0xffffffffffffffff 0x7df Long Decimals: 65000 -2 3261963 Long Hexadecimals: 000000000000fde8 fffffffffffffffe 000000000031c60b Fixed FP: 3.141593 2997924580.000000 0.000123 Exponential FP: 3.141593e+00 2.997925e+09 1.234000e-04 General FP: 3.14159 2.99792e+09 0.0001234 General FP with precision: 3.1 3e+09 0.00012 Boolean: 1 0 Character: A B C String: Apples and Bananas

String Interpolation

Many languages have the ability to replace variables written directly into strings. This variable substitution in strings is know as variable interpolation. In Perl we simply place a scalar variable in a double quoted string. Double quoted strings provide interpolation capability whereas single quoted strings do not. While not always necessary, if there is possiblity for confusion between the variable name and other text in the string we can wrap the variable name in curly brackets such as ${variable_name}. To embed a constant in a string the backslash (\) must be used before the name of the constant as in ${\constant_name}. To interpolate an artibtrary expression use the syntax @{[exression]}.

OutputFormatting3.pl

#!/usr/bin/env perl
use utf8;
use strict;
use warnings;

MAIN:
{
	my $FICA_RATE = 0.0765;
	my $PAY_RATE = 20.;
	my $annual_salary;

	$annual_salary = $PAY_RATE * 2080.0;
# Single quote strings will not interpolate variables like ${annual_salary}
# Must use double quote strings if you want variable interpolation.
	print "FICA Tax Rate: ${FICA_RATE}\n";
	print "Annual Salary: ${annual_salary}\n";
	print "FICA Tax: @{[$FICA_RATE * $annual_salary]}\n";
}

Output

$ perl OutputFormatting3.pl FICA Tax Rate: 0.0765 Annual Salary: 50000 FICA Tax: 3825 $ perl OutputFormatting3.pl FICA Tax Rate: 0.0765 Annual Salary: 41600 FICA Tax: 3182.4 $ perl OutputFormatting3.pl FICA Tax Rate: 0.0765 Annual Salary: 41600 FICA Tax: 3182.4 $ perl OutputFormatting3.pl FICA Tax Rate: 0.0765 Annual Salary: 41600 FICA Tax: 3182.4 $ perl OutputFormatting3.pl FICA Tax Rate: 0.0765 Annual Salary: 41600 FICA Tax: 3182.4

Modern Message Formatting

The methods for formatting strings and output discuss so far have some limitations when it comes to localizing software. The positional approaches taken by the printf() style functions poses difficulties to localization because often during translation the order of words and thus the substition specifiers must change but the hard-code argument list in our code can not change to match. Variable interpolation has its drawbacks because you are actually putting code into the strings. When externalizing the strings (removing them from the code and putting them in a separate file) necessary for localization, it is not desirable to export variables and expressions from our code to the localization file where they can be changed.

This is why message formatting approaches have been developed that use substitution specifiers that specify which argument to the message format function is to be used. This allows the substitution specifiers to be in a different order (and perhaps re-ordered during localization) than the formal arguments to the message format function. These substitution specifiers are also not executable code as is the case with variable interpolation so it is much safer to externalize from our program code as we will see in the section on Internationalization.

A messageFormat() function that is modelled after the Python format() function is provided in the [[Pure Programmer Perl Utils Module]]. Save the "Utils.pm" source in the same folder where your source files reside. Once this is done, you may use the use for "Utils" to be able to use the Utils module in your programs.

#!/usr/bin/env perl

use strict;
use warnings;
use utf8;

use Utils;

MAIN:
{
	binmode(STDOUT, ":utf8");

	my $first = "First";
	my $middle = "Middle";
	my $last = "Last";

	my $s = Utils::messageFormat("{2}, {0} {1}\n", $first, $middle, $last);
	print $s;
}

Pure Programmer Utils Example

The format() method uses a pair of curly brackets to identify substition placeholders in the format string. Each pair of curly brackets contains a number from 0 to the number of addtional arguments - 1. This number refers to the position in the argument list of the value that will be used in the substitution. This allows values in the argument list to be used in any order needed in the format string or even used more than once.

format() Illustration — format() specifiers can match any argument

Following the position number in the substitution placeholder is an optional type specifier. The type specifier is separated from the position number by a colon. If you don't wish to use a type specifier then the type is assumed to be string. Some of the type specifiers available are listed in the table below.

Frequently Used Type Specifiers
Specifier	Value Type	Description
b	integer	binary
d	integer	decimal
o	integer	octal
x	integer	hexadecimal (lowercase)
X	integer	hexadecimal (uppercase)
f	double	fixed notation
e	double	scientific notation (lowercase)
E	double	scientific notation (uppercase)
g	double	shortest representation: e or f
G	double	shortest representation: E or f
c	integer	single character
s	string	string of characters

Like we have seen in printf(), the format() specifiers can take optional modifiers that change how the value is to be formatted. The full definition of the specifiers, including optional components is as follows:

{[arg-pos]:[fill-and-align][sign][#][0][width][.precision][type specifier]}

The fill-and-align option is an optional fill character (which can be any character other than { or }), followed by one of the align options <, >, ^. If no fill character is specified, then the space character is used.

Align Options
Option	Description
`<`	Left-justify within the given field width. This is the default for non-numeric types.
`>`	Right-justify within the given field width. This is the default for numeric types.
`^`	Aligns the value in the center of the field.

The sign option is one of +, - or the space character.

Sign Options
Option	Description
+	Signifies that a sign character should be output for all values (`+` for positive and `-` for negative).
-	Signifies that a sign character should be output for negative values only. This is the default for numeric types.
<space>	Signifies that a leading space should be used for positive values and a `-` should be output for negative values.

The # option causes alternate formatting to be used.

For integral types, when binary, octal, or hexadecimal presentation type is used, the alternate form inserts the prefix (0b, 0, or 0x) into the output value after the sign character.
For floating-point types, the alternate form causes the result of the format to always contain a decimal-point character, even if no digits follow it.

The 0 option pads the field with leading zeros (following any indication of sign or base) to the field width. If the 0 option and an align option both are used, the 0 option is ignored.

The width option is a positive decimal number. If present, it specifies the minimum field width. If the formatted value can not fit within the field width the entire value will be inserted causing the field to be larger than width.

The precision option is a . followed by a non-negative decimal number. This option indicates the precision to use when formatting a value. For floating-point types, precision specifies the formatting precision, i.e. the number of places after the decimal point to display. For string types, it specifies how many characters from the string will be used.

OutputFormatting4.pl

#!/usr/bin/env perl
use utf8;
use Utils;
use strict;
use warnings;

MAIN:
{
	binmode(STDOUT, ":utf8");
	binmode(STDERR, ":utf8");
	binmode(STDIN, ":utf8");
	my $first = "First";
	my $middle = "Middle";
	my $lastname = "Last";
	my $left = "Left";
	my $center = "Center";
	my $right = "Right";
	my $favorite = "kerfuffle";
	my $i1 = 3261963;
	my $i2 = -42;
	my $fp1 = 3.1415926;
	my $fp2 = 2.99792458e9;
	my $fp3 = -1.234e-4;

	my $s = Utils::messageFormat("\{2\}, \{0\} \{1\}", $first, $middle, $lastname);
	print $s, "\n";
	$s = Utils::messageFormat("\{0\} \{1\} \{2\}", $left, $center, $right);
	print $s, "\n";
	$s = Utils::messageFormat("Favorite number is \{0\}", $i1);
	print $s, "\n";
	$s = Utils::messageFormat("Favorite FP is \{0\}", $fp1);
	print $s, "\n";

	my $c = ord(substr($favorite, 0, 1));
	print Utils::messageFormat("Favorite c is \{0:c\}", $c), "\n";
	print Utils::messageFormat("Favorite 11c is   |\{0:11c\}|", $c), "\n";
	print Utils::messageFormat("Favorite <11c is  |\{0:<11c\}|", $c), "\n";
	print Utils::messageFormat("Favorite ^11c is  |\{0:^11c\}|", $c), "\n";
	print Utils::messageFormat("Favorite >11c is  |\{0:>11c\}|", $c), "\n";
	print Utils::messageFormat("Favorite .<11c is |\{0:.<11c\}|", $c), "\n";
	print Utils::messageFormat("Favorite _^11c is |\{0:_^11c\}|", $c), "\n";
	print Utils::messageFormat("Favorite  >11c is |\{0: >11c\}|", $c), "\n";

	$c = 0x1F92F;
	print Utils::messageFormat("Favorite emoji c is \{0:c\}", $c), "\n";

	print Utils::messageFormat("Favorite s is \{0:s\}", $favorite), "\n";
	print Utils::messageFormat("Favorite .2s is \{0:.2s\}", $favorite), "\n";
	print Utils::messageFormat("Favorite 11s is     |\{0:11s\}|", $favorite), "\n";
	print Utils::messageFormat("Favorite 11.2s is   |\{0:11.2s\}|", $favorite), "\n";
	print Utils::messageFormat("Favorite <11.2s is  |\{0:<11.2s\}|", $favorite), "\n";
	print Utils::messageFormat("Favorite ^11.2s is  |\{0:^11.2s\}|", $favorite), "\n";
	print Utils::messageFormat("Favorite >11.2s is  |\{0:>11.2s\}|", $favorite), "\n";
	print Utils::messageFormat("Favorite .<11.2s is |\{0:.<11.2s\}|", $favorite), "\n";
	print Utils::messageFormat("Favorite *^11.2s is |\{0:*^11.2s\}|", $favorite), "\n";
	print Utils::messageFormat("Favorite ->11.2s is |\{0:->11.2s\}|", $favorite), "\n";

	print Utils::messageFormat("Favorite d is \{0:d\}", $i1), "\n";
	print Utils::messageFormat("Another d is \{0:d\}", $i2), "\n";
	print Utils::messageFormat("Favorite b is \{0:b\}", $i1), "\n";
	print Utils::messageFormat("Another B is \{0:b\}", $i2), "\n";
	print Utils::messageFormat("Favorite o is \{0:o\}", $i1), "\n";
	print Utils::messageFormat("Another o is \{0:o\}", $i2), "\n";
	print Utils::messageFormat("Favorite x is \{0:x\}", $i1), "\n";
	print Utils::messageFormat("Another X is \{0:X\}", $i2), "\n";
	print Utils::messageFormat("Favorite #b is \{0:#b\}", $i1), "\n";
	print Utils::messageFormat("Another #B is \{0:#b\}", $i2), "\n";
	print Utils::messageFormat("Favorite #o is \{0:#o\}", $i1), "\n";
	print Utils::messageFormat("Another #o is \{0:#o\}", $i2), "\n";
	print Utils::messageFormat("Favorite #x is \{0:#x\}", $i1), "\n";
	print Utils::messageFormat("Another #X is \{0:#X\}", $i2), "\n";
	print Utils::messageFormat("Favorite 11d is   |\{0:11d\}|", $i1), "\n";
	print Utils::messageFormat("Favorite +11d is  |\{0:+11d\}|", $i1), "\n";
	print Utils::messageFormat("Favorite 011d is  |\{0:011d\}|", $i1), "\n";
	print Utils::messageFormat("Favorite 011x is  |\{0:011x\}|", $i1), "\n";
	print Utils::messageFormat("Favorite #011x is |\{0:#011x\}|", $i1), "\n";

	print Utils::messageFormat("Favorite f is \{0:f\}", $fp1), "\n";
	print Utils::messageFormat("Another f is \{0:f\}", $fp2), "\n";
	print Utils::messageFormat("One more f is \{0:f\}", $fp3), "\n";
	print Utils::messageFormat("Favorite e is \{0:e\}", $fp1), "\n";
	print Utils::messageFormat("Another e is \{0:e\}", $fp2), "\n";
	print Utils::messageFormat("One more e is \{0:e\}", $fp3), "\n";
	print Utils::messageFormat("Favorite g is \{0:g\}", $fp1), "\n";
	print Utils::messageFormat("Another g is \{0:g\}", $fp2), "\n";
	print Utils::messageFormat("One more g is \{0:g\}", $fp3), "\n";
	print Utils::messageFormat("Favorite .2f is \{0:.2f\}", $fp1), "\n";
	print Utils::messageFormat("Another .2f is \{0:.2f\}", $fp2), "\n";
	print Utils::messageFormat("One more .2f is \{0:.2f\}", $fp3), "\n";
	print Utils::messageFormat("Favorite .2e is \{0:.2e\}", $fp1), "\n";
	print Utils::messageFormat("Another .2e is \{0:.2e\}", $fp2), "\n";
	print Utils::messageFormat("One more .2e is \{0:.2e\}", $fp3), "\n";
	print Utils::messageFormat("Favorite .2g is \{0:.2g\}", $fp1), "\n";
	print Utils::messageFormat("Another .2g is \{0:.2g\}", $fp2), "\n";
	print Utils::messageFormat("One more .2g is \{0:.2g\}", $fp3), "\n";
	print Utils::messageFormat("Favorite 15.2f is |\{0:15.2f\}|", $fp1), "\n";
	print Utils::messageFormat("Another 15.2e is  |\{0:15.2e\}|", $fp2), "\n";
	print Utils::messageFormat("One more 15.2g is |\{0:15.2g\}|", $fp3), "\n";
}

Output

$ perl OutputFormatting4.pl Last, First Middle Left Center Right Favorite number is 3261963 Favorite FP is 3.1415926 Favorite c is k Favorite 11c is |k | Favorite <11c is |k | Favorite ^11c is | k | Favorite >11c is | k| Favorite .<11c is |k..........| Favorite _^11c is |_____k_____| Favorite >11c is | k| Favorite emoji c is 🤯 Favorite s is kerfuffle Favorite .2s is ke Favorite 11s is |kerfuffle | Favorite 11.2s is |ke | Favorite <11.2s is |ke | Favorite ^11.2s is | ke | Favorite >11.2s is | ke| Favorite .<11.2s is |ke.........| Favorite *^11.2s is |****ke*****| Favorite ->11.2s is |---------ke| Favorite d is 3261963 Another d is -42 Favorite b is 1100011100011000001011 Another B is -101010 Favorite o is 14343013 Another o is -52 Favorite x is 31c60b Another X is -2A Favorite #b is 0b1100011100011000001011 Another #B is -0b101010 Favorite #o is 014343013 Another #o is -052 Favorite #x is 0x31c60b Another #X is -0X2A Favorite 11d is | 3261963| Favorite +11d is | +3261963| Favorite 011d is |00003261963| Favorite 011x is |0000031c60b| Favorite #011x is |0x00031c60b| Favorite f is 3.141593 Another f is 2997924580.000000 One more f is -0.000123 Favorite e is 3.141593e+00 Another e is 2.997925e+09 One more e is -1.234000e-04 Favorite g is 3.14159 Another g is 2.99792e+09 One more g is -0.0001234 Favorite .2f is 3.14 Another .2f is 2997924580.00 One more .2f is -0.00 Favorite .2e is 3.14e+00 Another .2e is 3.00e+09 One more .2e is -1.23e-04 Favorite .2g is 3.1 Another .2g is 3e+09 One more .2g is -0.00012 Favorite 15.2f is | 3.14| Another 15.2e is | 3.00e+09| One more 15.2g is | -0.00012|

Questions

What is the difference in output between the printf() specifier "%8d" and "%8x"?
What is the difference in output between the printf() specifier "%8d" and "%-8d"?
What is the difference in output between the printf() specifier "%8d" and "%+8d"?
What is the difference in output between the printf() specifier "%f" and "%e"?
What is the difference in output between the printf() specifier "%f" and "%g"?
How many characters from the string are printed when using "%10.8s"?
How many characters characters in total are printed when using "%10.8s"?
When using "%5.3f" how is 34.5678 formatted?
When using "%.1f%%" how is 12.345 formatted?
What is the difference in output between the format() specifier {0:010x} and {0:+10x}?
What is the difference in output between the format() specifier {0:+5.2f} and {0:-5.2f}?
What is the difference in output between the format() specifier {0:^20s} and {0:<20s}

Projects

More ★'s indicate higher difficulty level.

References

[[PerlDoc sprintf]]