Lecture 4: Information Representation in Computers

ENGSCI 232 Computer Systems

Lecture 4: Information Representation in Computers

The basic unit of information on modern computers is the bit.

A bit may assume one of two possible values: on/off; 0/1; true/false; 0V/5V, …

Because there are only two possible values, a single bit cannot represent anything more complicated than a Boolean (logical) variable.

If we combine bits, we can represent a greater variety of values.

1 bit can assume one of 2 possible values {0, 1}.
2 bits can assume one of …4 possible values {00, 01, 10, 11}.
3 bits can assume one of …8 possible values {000, 001, 010, 011, 100, 101, 110, 111}.
4 bits - termed a ……………nibble - can assume one of ……16 possible values {0000, 0001, …, 1111}
8 bits - termed a …………byte - can assume one of ………256 possible values {0000 0000, 0000 0001, …, 11111111}.
16 bits - often termed a 16-bit…………word - can assume one of ………65536 possible values {00000000 00000000, 00000000 00000001, …, 11111111 11111111}.

n / 2n
1 / 2
2 / 4
4 / 16
8 / 256
16 / 65536
32 / 4294967296
64 / 18446744073709551616
128 / 340282366920938463463374607431768211456
256 / 115792089237316195423570985008687907853…
…269984665640564039457584007913129639936

The computer’s memory consists of a large sequence of bytes, with data stored in successive bytes. (Each byte actually is given a numerical address to help manage the data.)

We have to choose what meaning to associate with each state a byte can represent. For example, if we see 01001011 sitting in some byte of a computer’s memory, what does it represent? It can, in fact represent many things, depending on the context.

Firstly, consider how we represent non-negative integers. These are also termed unsigned integers.

Base-k Representations of Nonnegative (Unsigned) integers

Recall that the same concept may have different representations. The base (or radix) of a number system defines the range of possible values that a digit may have.

In a base 10 (decimal) number system the possible digits are {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}. Most humans are familiar with base 10.
In a base 2 (binary) number system the possible digits are {0, 1}
In a base 8 (………..octal) number system the possible digits are {0, 1, …, 7}
In a base 16 (………..….………..hexadecimal) number system the possible digits are
{0, 1, …, 9, A, B, C, D, E, F}

The general form of an n digit number represented in base k is

(bn-1bn-2 … b2b1b0)k

where bi is the value of the digit in position i.

The value of this number is

value = sum_i=0^n-1 bi ki

Examples:

4210 = 4*101 +2*100

528 = 5*81+2*80 ( = 4210)

1010102 = 1*25 + 1*23 + 1*21 = 4210

2A16 = 2*161 + 10*160 = 4210

Notes:

bn-1 is called the most significant digit, and b0 is called the least significant digit
In base 2, bn-1 is called the most significant bit (msb), and b0 is called the least significant bit (lsb). For example, 0101001010111

msb lsb

Addition in Base k

We use the standard rules to add in base k, making sure we ‘carry’ whenever we get a value of k or larger.

Example:

0001001012

+0011001002

=0100010012

01200021103

+00120021023

=02020112123

Converting from Decimal into Base k

The remainder method can be used to convert a base 10 integer to its equivalent base-k value.

Remainder Method:

Let value = (cn-1cn-2 … c2c1c0)10.

First divide value by k, the remainder is the least significant digit b0.
Divide the result by k, the remainder is b1.
Continue this process until the result is less than k, giving the most significant digit, bn-1.

Formally, we can put this into a VB algorithm

i = 0

while value > 0

bi=value mod k

value = value \ k

i  i+1

loop

Notes: In VB,

x mod y gives the remainder when x is divided by y
x \ y divides two numbers and returns an integer result, discarding any fractional component

Example: What is the base 3 representation of 4210?
42/3 = ……14, remainder 0,  b0 = 0(least significant digit)
14/3 = ……4, remainder 2,  b1 = 2
4/3 = ……1, remainder 1,  b2 = 1
1/3 = ……0, remainder 1,  b3 = 1
Thus 4210 = 11203

Check: 11203 = 1*33 + 1*32 + 2*31 + 0*30 = 4210

Example: What is the base 2 (binary) representation of 4210?

Base= / 2
value / value \ 2 / value mod 2
42 / 21 / 0 (lsb)
Answer / 4210=

Example: What is the base 16 (hexadecimal) representation of 4210?
42/16 = ……2, remainder 10,  b0 = A(least significant digit)
2/16 = ……0, remainder 2,  b1 = 2
Thus 4210 = 2A16

Check:2A10 = 2*161 + 10*160 = 4210

Common Hexadecimal & Binary Values

Almost all computers use base 2 at the hardware level, but programmers prefer to think in octal or, more commonly, in hexadecimal. Hexadecimal is particularly convenient as 1 hexadecimal digit allows 16 combinations, corresponding to a nibble, while 2 digits allow 256 different values, matching a byte. The 16 hexadecimal values are shown in the table.

Changing Base using Excel

There are some useful Excel functions for converting numbers into a string representation in some other base:

BIN2DECConverts a binary number to decimal

BIN2HEXConverts a binary number to hexadecimal

BIN2OCTConverts a binary number to octal

DEC2BINConverts a decimal number to binary

DEC2HEXConverts a decimal number to hexadecimal

DEC2OCTConverts a decimal number to octal

HEX2BINConverts a hexadecimal number to binary

HEX2DECConverts a hexadecimal number to decimal

HEX2OCTConverts a hexadecimal number to octal

OCT2BINConverts an octal number to binary

OCT2DECConverts an octal number to decimal

OCT2HEXConverts an octal number to hexadecimal

Examples: Hex2Bin("2F")="101111", Dec2Bin(123)="1111011”

Non-Negative Integers, Hexadecimal and Overflow in Visual Basic

On computers nonnegative (unsigned) integers are represented by a fixed number of bits (typically 8, 16, 32, and/or 64). Thus, there are only a finite set of numbers that can be represented:

With 8 bits, 0…255 (0016…FF16) can be represented;
With 16 bits, 0…65535 (000016…FFFF16) can be represented;

VB offers the byte as an unsigned value between 0 and 255:

dim i as byte' i can hold values between 0 and 255 inclusive

Note: If an operation on bytes has a result outside this range, VB will give an ‘overflow’ error. This error is sometimes called a range exception. In some languages, no error is generated, but the result is wrong.

Dim i As Byte, j As Byte, k As Byte

i = 200

j = 200

k = i + j' gives a run-time “overflow” error

We can immediately see why this overflow happens if we look at the binary, and remember we only have 8 bits to store the result in:

+ 110010002 200

+ 110010002+200

= 1100100002=400 Too large; we have ‘overflowed’ into a ninth bit

We can enter binary and hexadecimal values directly in VB as follows:

k = &HA2 ' set k=A2 base 16

k = &O75 ' set k=75 base 8

A few other useful VB functions exist for converting decimal to hexadecimal or octal strings.

Dim s as string

s = Hex(5)' Returns 5.

s = Hex(10)' Returns A.

s = Hex(459)' Returns 1CB.

s = Oct(4) ' Returns 4.

s = Oct(8)' Returns 10.

s = Oct(459)' Returns 713.

Signed Integer Representations

In many cases, we wish to represent both negative and positive integer values. In order to represent (signed) integer values we divide the range of available binary patterns into two approximately equal sections, one section to represent positive and the other to represent negative values. There are four standard methods for representing integers:

Sign-and-magnitude;
One’s complement and Two’s complement.
Excess-value

In most of these, positive integers are represented as if unsigned. These representations mainly differ in how they represent negative numbers.

Sign-and-magnitude for Signed Integers:

By convention the most significant bit is set to 0 and 1 for positive and negative numbers, respectively.

Example

+4210 = 001010102

and so

-4210 = 101010102(Sign&Magnitude)

Under this representation, there are two representations of ....…….zero, namely…+0 and –0..

In 8 bit binary sign and magnitude these two values are represented by

+0=00000000

and

-0=…10000000.

If we have n bits in our representation then,

The most positive representable number using sign-and-magnitude is …2n–1–1
The most negative representable number using sign-and-magnitude is …-(2n–1–1)

Ones complement for Signed Integers:

A one’s complement representation of a negative number is obtained from the n-bit binary representation of the positive number by complementing the bits, or replacing all of the 0s to 1s and all of the 1s to 0s.

Example

+4210 = 001010102

and so

-4210 = 101010102(OnesComplement)

Again, there are two representations of zero. For example, using 8 bit one’s complement;

zero is represented

00000000

and

…11111111

If we have n bits in our representation then,

The most positive representable one’s complement number is …2n–1–1
The most negative representable one’s complement number is …-(2n–1–1)

Twos Complement for Signed Integers

A two’s complement representation of a negative number is obtained from the n-bit binary representation of the positive number by complementing the bits, and then adding 1.

Example

+4210 = 001010102

and so

-4210 = 11010101

+ 00000001

= 110101102(TwosComplement)

It is exactly the same to just subtract 1, and then reverse the bits. For example, 42-1= 41,

+4110 = 001010012

and so

-4210 = 110101102(TwosComplement)

In twos complement, there is only one representation of 0, being (for 8 bits) 00000000.

If we have n bits in our representation then,

The most positive representable two’s complement number is …2n–1–1
The most negative representable two’s complement number is …-2n–1

Two’s complement is the most widely used integer representation on modern computers.

Notes:

In two’s complement, we call the most significant bit the …………sign bit. If it is set, we have a negative number.
Regardless of the number of bits being used, -1 is given by …

Two’s complement addition and subtraction

Twos complement is popular because addition of two’s complement numbers is simple; we can (just about) treat the numbers as they were unsigned as long as we discard any overflow.

Example: Use 8 bit two’s complement to add 4210 and 5610
001010102sComplement= 4210
+001110002sComplement = 5610
=011000102sComplement = 9810

Example: Use 8 bit two’s complement to add -4210 and 5610
110101102sComplement=-4210
+001110002sComplement = 5610
=1000011102sComplement = 1210(discard 9th overflow bit)

Subtraction involves negating then adding:

Example: Use 8 bit 2’s complement to subtract 5610 from 4210.

Now,

56-1=55=001101112,

-56=110010002.

001010102’s complement= 4210
+110010002’s complement = –5610
=111100102’s complement = –1410

We said earlier that we could add two’s complement numbers as if they were unsigned. The only exception to this is the correct detection of …overflow.

Example:

Consider the following 4 bit examples, and assume we ignore any 5th bit when interpreting the answer.

00012sComplement= +110
+01002sComplement= +410
=01012sComplement= +5104-bit value=5Y

01112sComplement= +710
+01112sComplement= +710
=11102sComplement= 14104-bit value=-2N

11102sComplement= 210
+11112sComplement=110
=111012sComplement= 3104-bit value=-3Y

10012sComplement= 710
+10012sComplement=710
=100102sComplement= -14104-bit value=-2N

01112sComplement= +710
+11112sComplement=110
=101102sComplement= +6104-bit value=6Y

We test for overflow as follows. Let x3x2x1x0 + y3y2y1y0= z4z3z2z1z0 (all base 2) where z4 is a temporary 5th bit in the answer that will be discarded before the result is stored.

Overflow has occurred if x3=y3 but z4>z3

Excess-value for Signed Integers (“Move the Zero”)

For an n-bit binary number a fixed value of 2n–1 is added to the value to obtain its excess-value (or “excess-2n–1”) representation. Sometimes we use a more general “excess-y” representation, where y is some other value chosen to suit the problem

Example 1 (4 bits):

If n=4 bits, 2n–1=…8, so we use an excess-…8 representation. Under this scheme, the representation for x is simply the unsigned representation for x+8.

Example 2 (8 bits):

If n=8 bits, 2n–1=……128, so we use an excess-…128 representation. The representations are

+4210 +12810= 17010

= 101010102(Excess128)

4210 +12810= 86

= 010101102(Excess128)

There is only one representation for zero, …2n–1.

If we have n bits in our representation then,

The most positive excess-2n–1 representable number is …2n–1–1
The most negative excess-2n–1 representable number is …-2n–1

Signed Integers in VB

VB uses 2’s complement integers. The following types are available:

dim i as integer' 2 byte integer, -32,768 to 32,767

dim j as long' 4 byte integer, -2,147,483,648 to 2,147,483,647

Floating point Representation

In the decimal system, the decimal point indicates the start of negative powers of 10.

12.34 = 1*101+2*100+3*10-1+4*10-2

If we are using a system in base  (ie the radix is ) the ‘radix point’ serves the same function:

101.1012= 1*22+0*21+1*20+1*2-1+0*2-2+1*2-3
= 410+ 110+ 0.510+ 0.12510
= 5.62510

A floating point representation allows a large range of numbers to be represented in a relatively small number of digits by separating the digits used for precision from the digits used for range.

Example: The following have 4 decimal digits of precision, but together cover a large range of values

1234 = 1.234 x 103
0.000000001234 = 1.234 x 10-9
12,340,000,000 = 1.234 x 109

Note that the same number can be represented in different ways. For example the following are all the same value

3.14159100<- normalised
0.314159101,
31.415910-1
3141.5910-3

To avoid multiple representations of the same number floating point numbers are usually normalised so that there is only one nonzero digit to the left of the ‘radix’ point, called the leading digit.

In general, a normalised (non-zero) floating-point number will be represented using

(–1)sd0·d1d2…dp–1e,

where

s is the sign,
d0·d1d2…dp–1 - termed the significand- has p significant digits
each digit satisfies 0di
e, emineemax, is the exponent,
 is the base (or radix)

Example:

If  = 10 (base 10) and p = 3, the number 0·1 is represented as …1·0010-1.
If  = 2 (base 2) and p = 24, the decimal number 0·1 cannot be represented exactly but is approximately 1·100110011001100110011012-4.

Formally, (–1)s d0·d1d2…dp-1erepresents the value (–1)s (d0 + d1-1+d2-2 …+dp-1-(p-1))e

Precision & Range allowed by a Representation

The range of numbers that can be represented is determined by the number of digits in the exponent (i.e. by emax ) and the base  to which it is raised, while the precision is represented by the number of digits p in the significand and its base .

Example: If =2(base 2), p=3 digits in the significand, and the exponent ranges from emin=–1 to emax=2, and the numbers are normalised, then

the range of values (most negative to most positive) we can store is ...-1.112x22 to +1.112x22
the smallest (closed to zero) positive normalised value we can store is …1.00x2-1

The full list of normalised positive values that can be represented is shown in the table and on the number line below.

Hidden Bits in Binary Representations for Floating Point Numbers

For a significand represented as a binary number (=2) the normalisation condition requires that the leading digit is always .....1
There is thus no need to store the leading digit in a binary significand.
This bit is referred to as the ...... .hidden bit
eg if we have 1.101101 x 22, for the significand we only need to store ...101101

IEEE 754 floating point standard

There are many ways to represent floating point numbers. In order to improve portability most computers use the IEEE 754 floating point standard. There are two primary formats:

32 bit single precision; and
64 bit double precision.

IEEE 754 Single precision floating point standard

Single precision consists of:

a single sign bit, 0 for positive and 1 for negative;
an 8 bit base-2 (=2) excess-127 exponent, with emin=–126 (stored as 12710-12610=110=000000012) and emax=127 (stored as 12710+12710=25410=111111102). Note that stored values 000000002=010 and 111111112=25510 are reserved for special numbers
a 23 bit base-2 (=2) significand, with a hidden bit giving a precision of 24 bits (i.e. 1.d1d2…d23);

Notes

Single precision has 24 bits precision, equivalent to about 7.2 decimal digits.
The largest representable non-infinite number is almost 221273.4028231038
The smallest representable non-zero normalised number is 12–1261.1754910–38
Denormalised numbers (eg 0.01x2-126) can be represented.
There are two zeros, ±0.
There are two infinities, ±.
A NaN (not a number) is used for results from undefined operations such as …sqrt(-1)

Text Representation

The ASCII standard was developed in 1963. ASCII -- American Standard Code for Information Interchange -- permitted machines from different manufacturers to exchange data. ASCII consists of 128 binary values (0 to 127), each associated with a character or command. ASCII was developed a long time ago and now the non-printing characters are rarely used for their original purpose. ASCII was actually designed for use with teletypes and so the descriptions are somewhat obscure.

If someone says they want your file in ASCII format, all this means is they want 'plain' text with no formatting such as different fonts, different font sizes, bold or underline. They want the raw ASCII format that any computer can understand. This is usually so they can easily import the file into their own applications without issues. Notepad.exe creates ASCII text, or in MS Word you can save a file as “text only”.

Non-Printing Characters / Printing Characters
Name / Ctrl
char / Dec / Hex / Char / Dec / Hex / Char / Dec / Hex / Char / Dec / Hex / Char
null / ctrl-@ / 0 / 00 / NUL / 32 / 20 / Space / 64 / 40 / @ / 96 / 60 / `
startofheading / ctrl-A / 1 / 01 / SOH / 33 / 21 / ! / 65 / 41 / A / 97 / 61 / a
startoftext / ctrl-B / 2 / 02 / STX / 34 / 22 / " / 66 / 42 / B / 98 / 62 / b
endoftext / ctrl-C / 3 / 03 / ETX / 35 / 23 / # / 67 / 43 / C / 99 / 63 / c
endoftransmit / ctrl-D / 4 / 04 / EOT / 36 / 24 / $ / 68 / 44 / D / 100 / 64 / d
enquiry / ctrl-E / 5 / 05 / ENQ / 37 / 25 / % / 69 / 45 / E / 101 / 65 / e
acknowledge / ctrl-F / 6 / 06 / ACK / 38 / 26 / 70 / 46 / F / 102 / 66 / f
bell / ctrl-G / 7 / 07 / BEL / 39 / 27 / ' / 71 / 47 / G / 103 / 67 / g
backspace / ctrl-H / 8 / 08 / BS / 40 / 28 / ( / 72 / 48 / H / 104 / 68 / h
horizontaltab / ctrl-I / 9 / 09 / HT / 41 / 29 / ) / 73 / 49 / I / 105 / 69 / i
linefeed / ctrl-J / 10 / 0A / LF / 42 / 2A / * / 74 / 4A / J / 106 / 6A / j
verticaltab / ctrl-K / 11 / 0B / VT / 43 / 2B / + / 75 / 4B / K / 107 / 6B / k
formfeed / ctrl-L / 12 / 0C / FF / 44 / 2C / , / 76 / 4C / L / 108 / 6C / l
carriagefeed / ctrl-M / 13 / 0D / CR / 45 / 2D / - / 77 / 4D / M / 109 / 6D / m
shiftout / ctrl-N / 14 / 0E / SO / 46 / 2E / . / 78 / 4E / N / 110 / 6E / n
shiftin / ctrl-O / 15 / 0F / SI / 47 / 2F / / / 79 / 4F / O / 111 / 6F / o
datalineescape / ctrl-P / 16 / 10 / DLE / 48 / 30 / 0 / 80 / 50 / P / 112 / 70 / p
devicecontrol1 / ctrl-Q / 17 / 11 / DC1 / 49 / 31 / 1 / 81 / 51 / Q / 113 / 71 / q
devicecontrol2 / ctrl-R / 18 / 12 / DC2 / 50 / 32 / 2 / 82 / 52 / R / 114 / 72 / r
devicecontrol3 / ctrl-S / 19 / 13 / DC3 / 51 / 33 / 3 / 83 / 53 / S / 115 / 73 / s
devicecontrol4 / ctrl-T / 20 / 14 / DC4 / 52 / 34 / 4 / 84 / 54 / T / 116 / 74 / t
negativeacknowledge / ctrl-U / 21 / 15 / NAK / 53 / 35 / 5 / 85 / 55 / U / 117 / 75 / u
synchronousidel / ctrl-V / 22 / 16 / SYN / 54 / 36 / 6 / 86 / 56 / V / 118 / 76 / v
end of transmitblock / ctrl-W / 23 / 17 / ETB / 55 / 37 / 7 / 87 / 57 / W / 119 / 77 / w
cancel / ctrl-X / 24 / 18 / CAN / 56 / 38 / 8 / 88 / 58 / X / 120 / 78 / x
endofmedium / ctrl-Y / 25 / 19 / EM / 57 / 39 / 9 / 89 / 59 / Y / 121 / 79 / y
substitute / ctrl-Z / 26 / 1A / SUB / 58 / 3A / : / 90 / 5A / Z / 122 / 7A / z
escape / ctrl-[ / 27 / 1B / ESC / 59 / 3B / ; / 91 / 5B / [ / 123 / 7B / {
fileseparator / ctrl-\ / 28 / 1C / FS / 60 / 3C / 92 / 5C / \ / 124 / 7C / |
groupseparator / ctrl-] / 29 / 1D / GS / 61 / 3D / = / 93 / 5D / ] / 125 / 7D / }
recordseparator / ctrl-^ / 30 / 1E / RS / 62 / 3E / 94 / 5E / ^ / 126 / 7E / ~
unitseparator / ctrl-_ / 31 / 1F / US / 63 / 3F / ? / 95 / 5F / _ / 127 / 7F / DEL