# Basic Data Types

## Integers

Computers can only perform operations on numbers. Modern CPUs can perform operations on integer and floating point numbers. The most basic and most commonly used are the integers. Integers are even used to represent characters based on the character set mappings such as ASCII or Unicode.

To understand the capabilities of the integers supported by the computer, it is important to note that computers do not store integers (or floating point) using the decimal (base 10) digits that we are accustomed to using. Instead computers store numbers using [[binary]] (base 2). In binary there are only two digits: 0 and 1. Therefore a single binary digit or bit, can hold one of the two possible binary values. If we combine two bits, we can represent four different values. With three bits we can represent eight different values and four bits gives us 16 values. This doubling of the number of values represented with each bit added gives us the following formula for the number of values that can be stored with a given number of bits.

values = 2^{n} where n represents the number of bits.

The smallest group of bits that computers work with is a group of eight bits called a byte. A byte allows us to represent 256 different values. Groups of bytes combine to give us larger and larger ranges of value. Because we want to represent negative as well as positive numbers, we use half the range of values to represent negative values. Integers that can store positive or negative numbers are called signed integers.

# bits | # bytes | # values | Signed Range |
---|---|---|---|

8 | 1 | 256 | -128 to 127 |

16 | 2 | 65,536 | -32,768 to 32,767 |

32 | 4 | 4,294,967,296 | -2,147,483,648 to 2,147,483,647 |

64 | 8 | 18,446,744,073,709,551,616 | -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 |

For many years computers that operate primarily on 32-bit integers were common. Modern desktop computers typically operate primarily on 64-bit integers but 32-bit CPUs are still commonly embedded in devices other than desktop computers such as tablets, cell phones, routers, etc.

Integer [[literals]] are represented in our programs much as you would expect using decimal (base 10) notation using the digits 0-9. They can also be written in [[hexadecimal]] notation (base 16) which is handy in some situations. Hexadecimal values are written using the 16 digits 0-9 and A-F. They must be prefaced by the symbol 0x. Hexadecimal notation is a handy way to represent values that easily map to their binary representation as each hexadecimal digit represents exactly four binary digits. The following are examples of valid integer literals in both decimal and hexadecimal notation.

Decimal | Hexadecimal |
---|---|

0 | 0x0 |

8 | 0x8 |

15 | 0xf |

326 | 0x146 |

43981 | 0xabcd |

439041101 | 0x1a2b3c4d |

-326 | -0x146 |

### Questions

- In the table of integer specs above, the number of values is always a number that ends in the digit 6. Why is this?
- Integers with 128 bits would give us 2
^{128}≅ 3.4 x 10^{38}different values. Is this a useful data type or too excessive?

### Projects

More ★'s indicate higher difficulty level.

## Floating Point

To represent fractional values we need another data type in addition to integers. This is where floating point data types come in. Floating point data types can represent a whole number and a fractional part. There are two commonly used floating point representations: 32-bit singles and 64-bit doubles. The bit patterns of these floating point representations are defined in a specification called [[IEEE 754]] defined in 1984.

Name | Bits | Precision (Decimal Digits) | Exponent Min | Exponent Max |
---|---|---|---|---|

Single | 32 | ~7 | ~-38 | ~+38 |

Double | 64 | ~16 | ~-308 | ~+308 |

It would seem that [[IEEE singles]] would result in faster computations because they are smaller data types (4 bytes) vs [[IEEE doubles]] (8 bytes). It turns out that computations with [[IEEE doubles]] are faster because this is the native data type used by modern processors. Floating point [[literals]] always represented, therefore, as doubles.

Floating point [[literals]] are written using standard base 10 decimal notation. They can also be written in a form of [[scientific notation]] that uses the letter 'e' to indicate a power of ten multiplier such that a number like 1.23 × 10^{4} is written as `1.23e4`

. When written in scientific notation we see how the Floating Point Specs listed above apply. The Precision in Decimal Digits refers to the maximum number of digits we can represent in the mantissa (number before the 'e'). The Exponent Min and Max refer to the smallest and largest value of the power of ten (number after the 'e').

Fixed Point | Scientific Notation |
---|---|

0 | 0e0 |

3.1415 | 3.1415e0 |

15 | 1.5e1 |

326 | 3.26e2 |

0.123 | 1.23e-1 |

-3.1415 | -3.1415e0 |

-0.001 | -1e-3 |

### Questions

- What types of computations would benefit from the added precision of [[IEEE doubles]] over [[IEEE singles]]?
- What types of computations would benefit from the added exponent range of [[IEEE doubles]] over [[IEEE singles]]?
- What types of computations would be best done as integers instead of floating point?

### Projects

More ★'s indicate higher difficulty level.

## Booleans

The simplest of all basic data types is the [[Boolean]], named after [[George Boole]]. [[Booleans]] can only be one of two possible values: `False`

or `True`

. They are stored typically as numbers either 0 or 1 respectively. While the [[literal]] values `False`

or `True`

don't seem as varied or interesting as the other data types, we will see that they are extremely useful when used as the results of a [[Boolean expression]].

### Questions

- Why are Boolean values named after George Boole?

### Projects

More ★'s indicate higher difficulty level.

## Characters and Strings

Computers store characters as integer codes based on a particular coding scheme such as [[ASCII]], [[ISO Latin-1]] or [[Unicode]]. Even so, we treat characters as a distinct type separate from integers. This is because integers support arithmetic operations whereas characters do not.
Python can encode ASCII or Unicode characters. The default in Python 2 is ASCII and the default in Python 3 is Unicode.
Character constants are represented only within strings either enclosed in single or double quotes. Alternately the hexadecimal escape code for a character can be used by prefacing the code by `\x`

or `\u`

(depending on whether 2 or 4 hexadecimal digits are need to represent the character) and enclosing within a string.

Thus strings are merely a sequence of zero of more characters enclosed in quotes. An empty string (or null string) has no characters. Python string [[literals]] are enclosed in single or double quotes.

Starting with Python 3, Python characters are by default encoded using multiple bytes so they can represent [[Unicode]] characters.

In Python 2, however, there are plain (ASCII) and Unicode strings. By default strings are plain but if Python sees a Unicode character in the string it makes it a Unicode string. ASCII escape sequences for Unicode characters don't count as actual Unicode characters for determining the type of string. That is why you have to force the string to be a Unicode string by prefacing it with a 'u' before the initial quote. Then you can use ASCII escape seqeunces for Unicode characters and get the desired result. If you are directly writing Python 2 files with Unicode characters you must put the encoding comment `#coding: UTF-8`

as the second line in your file.

Type | Literal |
---|---|

String | "ABCD" |

String | 'ABCD' |

String | "\x65\x66\x67\x68" |

String | "\u2660\u2665\u2663\u2666" |

### Questions

- Why is ASCII support still important?
- Why is UTF-8 encoded Unicode backward compatible with ASCII?
- Why is Unicode support so important?
- Why is Unicode probably the last character coding scheme that will ever be developed?

### Projects

More ★'s indicate higher difficulty level.

## References

- [[Unicode Charts]]
- [[Unicode Lookup]]
- [[Python Unicode HOWTO]]
- [[Pragmatic Unicode]]