Basic Data Types

Integers

Computers can only perform operations on numbers. Modern CPUs can perform operations on integer and floating point numbers. The most basic and most commonly used are the integers. Integers are even used to represent characters based on the character set mappings such as ASCII or Unicode.

To understand the capabilities of the integers supported by the computer, it is important to note that computers do not store integers (or floating point) using the decimal (base 10) digits that we are accustomed to using. Instead computers store numbers using [[binary]] (base 2). In binary there are only two digits: 0 and 1. Therefore a single binary digit or bit, can hold one of the two possible binary values. If we combine two bits, we can represent four different values. With three bits we can represent eight different values and four bits gives us 16 values. This doubling of the number of values represented with each bit added gives us the following formula for the number of values that can be stored with a given number of bits.

values = 2ⁿ where n represents the number of bits.

The smallest group of bits that computers work with is a group of eight bits called a byte. A byte allows us to represent 256 different values. Groups of bytes combine to give us larger and larger ranges of value. Because we want to represent negative as well as positive numbers, we use half the range of values to represent negative values. Integers that can store positive or negative numbers are called signed integers.

Integer Specs
# bits	# bytes	# values	Signed Range
8	1	256	-128 to 127
16	2	65,536	-32,768 to 32,767
32	4	4,294,967,296	-2,147,483,648 to 2,147,483,647
64	8	18,446,744,073,709,551,616	-9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

For many years computers that operate primarily on 32-bit integers were common. Modern desktop computers typically operate primarily on 64-bit integers but 32-bit CPUs are still commonly embedded in devices other than desktop computers such as tablets, cell phones, routers, etc.

Integer [[literals]] are represented in our programs much as you would expect using decimal (base 10) notation using the digits 0-9. They can also be written in [[hexadecimal]] notation (base 16) which is handy in some situations. Hexadecimal values are written using the 16 digits 0-9 and A-F. They must be prefaced by the symbol 0x. Hexadecimal notation is a handy way to represent values that easily map to their binary representation as each hexadecimal digit represents exactly four binary digits. The following are examples of valid integer literals in both decimal and hexadecimal notation.

Example Integer Literals
Decimal	Hexadecimal
0	0x0
8	0x8
15	0xf
326	0x146
43981	0xabcd
439041101	0x1a2b3c4d
-326	-0x146

Questions

In the table of integer specs above, the number of values is always a number that ends in the digit 6. Why is this?
Integers with 128 bits would give us 2¹²⁸ ≅ 3.4 x 10³⁸ different values. Is this a useful data type or too excessive?

Projects

More ★'s indicate higher difficulty level.

Floating Point

To represent fractional values we need another data type in addition to integers. This is where floating point data types come in. Floating point data types can represent a whole number and a fractional part. There are two commonly used floating point representations: 32-bit singles and 64-bit doubles. The bit patterns of these floating point representations are defined in a specification called [[IEEE 754]] defined in 1984.

Floating Point Specs
Name	Bits	Precision (Decimal Digits)	Exponent Min	Exponent Max
Single	32	~7	~-38	~+38
Double	64	~16	~-308	~+308

It would seem that [[IEEE singles]] would result in faster computations because they are smaller data types (4 bytes) vs [[IEEE doubles]] (8 bytes). It turns out that computations with [[IEEE doubles]] are faster because this is the native data type used by modern processors. Floating point [[literals]] always represented, therefore, as doubles.

Floating point [[literals]] are written using standard base 10 decimal notation. They can also be written in a form of [[scientific notation]] that uses the letter 'e' to indicate a power of ten multiplier such that a number like 1.23 × 10⁴ is written as 1.23e4. When written in scientific notation we see how the Floating Point Specs listed above apply. The Precision in Decimal Digits refers to the maximum number of digits we can represent in the mantissa (number before the 'e'). The Exponent Min and Max refer to the smallest and largest value of the power of ten (number after the 'e').

Example Floating Point Literals
Fixed Point	Scientific Notation
0	0e0
3.1415	3.1415e0
15	1.5e1
326	3.26e2
0.123	1.23e-1
-3.1415	-3.1415e0
-0.001	-1e-3

Questions

What types of computations would benefit from the added precision of [[IEEE doubles]] over [[IEEE singles]]?
What types of computations would benefit from the added exponent range of [[IEEE doubles]] over [[IEEE singles]]?
What types of computations would be best done as integers instead of floating point?

Projects

More ★'s indicate higher difficulty level.

Booleans

The simplest of all basic data types is the [[Boolean]], named after [[George Boole]]. [[Booleans]] can only be one of two possible values: true or false. Unfortunately Perl doesn't have a [[Boolean]] data type. Instead it treats 0, '' or undef as false and anything that isn't false as true. This can lead to errors so you must be very careful when dealing with [[Boolean]] expressions in Perl. If you need a [[literal]] value to represent true or false, it is best to stick with 1 and 0, respectively.

Questions

Why are Boolean values named after George Boole?

Projects

More ★'s indicate higher difficulty level.

Boolean Representation

Characters and Strings

Computers store characters as integer codes based on a particular coding scheme such as [[ASCII]], [[ISO Latin-1]] or [[Unicode]]. Even so, we treat characters as a distinct type separate from integers. This is because integers support arithmetic operations whereas characters do not. Perl characters can be encoded using multiple bytes so they can represent [[ASCII]]/[[ISO Latin-1]] characters. The use utf8; declaration must appear at the beginning of your Perl program for [[Unicode]] support to be turned on. Character constants are represented only within strings either enclosed in single or double quotes. Alternately the hexadecimal code for a character can be used by prefacing the code by \x or wrapping with \x{} (depending on whether 2 or 4 hexadecimal digits are need to represent the character) and enclosing within a string.

Thus strings are merely a sequence of zero of more characters enclosed in quotes. An empty string (or null string) has no characters. Perl string [[literals]] are enclosed in single or double quotes. Single-quoted strings don't allow variable interpolation whereas double-quoted strings allow [[variable interpolation]].

Example Character & String Literals
Type	Literal
String	"ABCD"
String	'ABCD'
String	"\x65\x66\x67\x68"
String	"\x{2660}\x{2665}\x{2663}\x{2666}"

Questions

Why is ASCII support still important?
Why is UTF-8 encoded Unicode backward compatible with ASCII?
Why is Unicode support so important?
Why is Unicode probably the last character coding scheme that will ever be developed?

Projects

More ★'s indicate higher difficulty level.

References

[[Unicode Charts]]
[[Unicode Lookup]]