Notes from April 5 – Thursday

  • Here is a method of converting base-10 fractions (numbers less than 1) into fractions of other bases. A fraction F can be written as

L1*B^(-1)+L2*B^(-2)+L3*B^(-3) and so on, in which L1, L2 and L3 are literals of the base. If N is multiplied by B, the new integer which is extracted from the fraction would be L1 and the new fraction would be

L2*B^(-1)+L3*B^(-2) and so on. The operation would yield the most significant literal of the fraction. If the process is repeated using the new fraction, without the extracted integer, the next literal could be extract. This process could be repeated indefinitely until the new fraction becomes 0. For many numbers, the number of extracted literals could be infinite.

  • For example, the following process converts 0.671875 in base-ten to Binary:

0.671875*2=1.34375 Extract 1 new fraction=0.34375

0.34375 *2=0.6875 Extract 0 new fraction=0.6875

0.6875 *2=1.375 Extract 1 new fraction=0.375

0.375 *2=0.75 Extract 0 new fraction=0.75

0.75 *2=1.5 Extract 1 new fraction=0.5

0.5 *2=1.0 Extract 1 new fraction=0.0

Since the first literal is the most significant, the result is 0.101011 binary.

  • The above calculation can be performed faster using base-16:

0.671875*16=10.75 Extract A new fraction=0.75

0.75 *16=12.00 Extract C new fraction=0.0

The result is 0.AC in base-16. Using the hex to binary conversion, it can be written as 0. 1010 1100 which is the same result.

  • Fraction in binary is quite different from that in base-ten. 0.1 in binary means 1 x 2^(-1) or ½ or 0.5 base-10. 0.11 in binary is 0.75 in base-10, i.e.,1/2+1/4.
  • A very exact number in base-ten, such as 0.3 or 0.4 is never-ending in binary. See the conversion of 0.4:

0.4 *16=6.4 Extract 6 new fraction=0.4

0.4 *16=6.4 Extract 6 new fraction=0.4

The result is 0.6666666…. in hexadecimal. If only one Hex digit is kept, 6/16=0.375, a rough approximation of 0.4. If two Hex digits are kept; 0.66HEX =6/16+6/16/16=0.3984375; yielding a better approximation of 0.4. If 3 Hex digits are used: 0.666HEX = 6/16 + 6/16/16 + 6/16/16/16 = 0.39990234. Therefore, the more bits used, the more accurate the representation. By the way, the 6 is not rounded upward because 6/16 is less than half (8 or above should be rounded up in HEX).

  • Consider the conversion of another example, 0.47 in base 10.

0.47 *16=7.52 Extract 7 new fraction=0.52

0.52 *16=8.32 Extract 8 new fraction=0.32

0.32 *16=5.12 Extract 5 new fraction=0.12

0.12 *16=1.92 Extract 1 new fraction=0.92

0.92 *16=E.72 Extract E new fraction=0.72

0.72 *16=B.52 Extract B new fraction=0.52

0.52 *16=8.32 Extract 8 new fraction=0.32

The result is 0.7851EB851EB851EB…. in hexadecimal. The pattern 851EB would be repeated infinitely. Therefore, if only a finite number of bits are available, the precision is limited. The representation of 0.47 is:

0.47 base-ten 4 bits:0.8HEX approximation=0.50 base-ten

0.47 base-ten 8 bits:0.78HEX approximation=0.46875

0.47 base-ten 12 bits:0.785HEX approximation=0.4699707

0.47 base-ten 16 bits:0.7852HEX approximation=0.47000122

0.47 base-ten 20 bits:0.7851FHEX approximation=0.47000027

The 24th bit is 2^(-24)=0.00000006, therefore, it would round up. But if the 25th bit is 2^(-25)=0.00000003, it would not affect 7th base-10 digit. To gain 7 digits of base-10 accuracy, 24 bits are needed (that is why the IEEE single precision format is a work of genius).

  • The following is a discussion concerning IEEE (Institute of Electronic and Electrical Engineers) floating point formats. The 64-bit format is twice as long as the 32-bit format but it is much simpler to form. The description of the 64 bits is: 1 sign bit, 11-bit excess-1023 exponent and a 52+1 bit normalized fraction. The normalized fraction is in the form 1.nf, therefore, the leading 1 is never stored, only the 52-bit nf is stored. Thus, the 52+1 nomenclature. The sign bit is for the sign of the number. The “excess-1023” format is defined so there is no need for the sign of the exponent, 1023 is subtracted from the 11-bit integer exp so half of its value is negative and the other half positive. The floating point number, N, based on the definition, is then

N= (-1)^s * 2^(exp-1023) * 1.nf

and the 64 bits are stored as

seee eeee eeee ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff ffff

  • Example, convert 47.11 to IEEE Double Precision format. Step 1,

normalize 47.11 by doing 47.11/32*32, in which 32 is a 2^5. The normalized number is then 1.4721875 x 2^5. The sign is positive, so s=0. The number exp-1023=5, therefore, exp=1023+5=1028=408HEX. The first 12 bits is then 0100 0000 1000 since the sign bit is 0. The conversion of .nf=.4721875=0.78E147AE147AE147AE147A…. Therefore, collecting 52 bits or 13 HEX digits with round up, the answer is

40878E147AE147AE

This answer can be obtained easily using the matlab command num2hex(47.11).

  • From the above example, if -47.11 is converted, only the sign but would change. The first HEX digit of 0100 would then be 1100 due to the sign change. The first HEX digit would then be C instead of 4. The IEEE representation of -47.11 is then C0878E147AE147AE.
  • The following is a discussion concerning IEEE (Institute of Electronic and Electrical Engineers) 32-bit single precision floating point formats. The description of the 32 bits is: 1 sign bit, 8-bit excess-127 exponent and a 23+1 bit normalized fraction. The normalized fraction is in the form 1.nf, therefore, the leading 1 is never stored, only the 23-bit nf is stored. Thus, the 23+1 nomenclature (the 24th bit is desperately needed for 7 digits of accuracy). The sign bit is for the sign of the number. The “excess-127” format is defined so there is no need for the sign of the exponent, 127 is subtracted from the 8-bit integer exp so half of its value is negative and the other half positive. The floating point number, N, based on the definition, is then

N= (-1)^s * 2^(exp-127) * 1.nf

and the 32 bits are stored as

seee eeee efff ffff ffff ffff ffff ffff

Since the bits for s, e and f do not fit nicely in the first 3 HEX digits, the manipulation of the bits is more challenging that the double precision format.

  • Example, convert 47.11 to IEEE Single Precision format. Step 1,

normalize 47.11 by doing 47.11/32*32, in which 32 is a 2^5. The normalized number is then 1.4721875 x 2^5. The sign is positive, so s=0. The number exp-127=5, therefore, exp=127+5=132=84HEX. For this particular case, it is better to write exp in binary, i.e., 1000 0100 using the Hex to Binary table. With the sign bit, the first 9 bits are 0100 0010 0. The first two HEX digit is 42HEX. The third HEX digit must draw the first 3 bits from .nf=.4721875. Multiply .nf by 8 to obtain 3.7775 or 3.nf’. With these 3 bits, the third HEX digit is 0011. Obtain from .nf’ 5 more HEX digits by multiply it by 16 each time. The values of .nf’ in HEX is .C70A4 (with a round up). The packed format is then:

423C70A4

This answer can be obtained easily using the matlab command num2hex(single(47.11)).

  • From the above example, if -47.11 is converted, only the sign but would change. The first HEX digit of 0100 would then be 1100 due to the sign change. The first HEX digit would then be C instead of 4. The IEEE single precision representation of -47.11 is then C23C70A4.

Examples: Converting the number 0.05 to IEEE single precision coding.

Step 1: Put the number 0.05 in scientific notion using the powers of 2. If you multiple 0.05 by 32 you would have a number in the form of 1.nf, i.e., (0.05)(32)=1.6. The IEEE format wants the number that way so that the leading value of 1 does not have to be stored, thus saving the space of more precision for the rest of the number.

Now the number can be rewritten as

0.05 = (-1)^0 (1.6) 2^(-5) = (-1)^s (1.nf) 2^(EXP-127)

It is obtained from the fact that 0.05= (0.05*32)(1/32)=(1.6)(2^(-5)).

Compare the format and the number, we can conclude that the sign bit s=0 for a positive number. Also EXP-127 is -5. So EXP=127-5=122. Now convert 122 from base -10 to base-2. Using dec2hex(122) in matlab you can get 122 (base-10) = 7A (base-16). With that and the HEX table, you can write 122 (base-10) in binary easily, i.e.,

122 (base-10) = 7A (base-16) = 0111 1010 (base-2)

The above 8 bits is to be stored in the 8 bits intended for EXP. Now obtain 23 more bits from the nf portion. We need first 3 bits (explain later) and then 5 sets of 4-bit numbers (each 4-bit number is one hex digit). To get 3 bits, multiply nf by 8, to get 4 bits, multiply by 16. See the following:

1 . 6 x 8 (the 1 is not stored)

4 . 8 x 16 (the 4 is 100 in binary)

12 . 8 x 16 (the 12 is 1100 in binary or C in hex)

12 . 8 x 16 (the 12 is 1100 in binary or C in hex)

12 . 8 x 16 (the 12 is 1100 in binary or C in hex)

12 . 8 x 16 (the 12 is 1100 in binary or C in hex)

12 . 8 x 16 (the 12 is 1100 in binary or C in hex)

(since the last multiplication leaves behind

0.8, the 12 should rounded to 13, or D)

Now the big construction begins:

00111 1010 100 1100 1100 1100 1100 1100 1101

Now group the bits into 4-bits for easy writing in hex notation:

0011 1101 0100 1100 1100 1100 1100 1100 1101 = 3D4CCCCD

That would be how the number 0.05 is stored in IEEE 32-bit floating point format. You could get this answer using the matlab command num2hex(single(0.05)) and the answer is 3D4CCCCD.

Examples: Converting the number 0.05 to IEEE double precision coding.

Step 1: Put the number 0.05 in scientific notion using the powers of 2. If you multiple 0.05 by 32 you would have a number in the form of 1.nf, i.e., (0.05)(32)=1.6. The IEEE format wants the number that way so that the leading value of 1 does not have to be stored, thus saving the space of more precision for the rest of the number.

Now the number can be rewritten as

0.05 = (-1)^0 (1.6) 2^(-5) = (-1)^s (1.nf) 2^(EXP-1023)

It is obtained from the fact that 0.05= (0.05*32)(1/32)=(1.6)(2^(-5)).

Compare the format and the number, we can conclude that the sign bit s=0 for a positive number. Also EXP-1023 is -5. So EXP=1023-5=1018. The double precision format allows a much larger range for the number. Now convert 1018 from base -10 to base-2. Using dec2hex(1018) in matlab you can get 1018 (base-10) = 3FA (base-16). With that and the HEX table, you can write 1018 (base-10) in binary easily, i.e.,

1018 (base-10) = 3FA (base-16) = 011 1111 1010 (base-2)

The above 11 bits is to be stored in the 11 bits intended for EXP. Now obtain 52 more bits from the nf portion. This is easier than the single precision encoding because we can just extract 13 hex digits by multiplying 1.nf by 16 thirteen times. See the following:

1 . 6 x 16 (the 1 is not stored)

9 . 6 x 16 (the 9 is 1001 in binary and 9 in hex)

9 . 6 x 16 (the 9 is 1001 in binary and 9 in hex)

9 . 6 x 16 (the 9 is 1001 in binary and 9 in hex)

9 . 6 x 16 (the 9 is 1001 in binary and 9 in hex)

9 . 6 x 16 (the 9 is 1001 in binary and 9 in hex)

9 . 6 x 16 (the 9 is 1001 in binary and 9 in hex)

9 . 6 x 16 (the 9 is 1001 in binary and 9 in hex)

9 . 6 x 16 (the 9 is 1001 in binary and 9 in hex)

9 . 6 x 16 (the 9 is 1001 in binary and 9 in hex)

9 . 6 x 16 (the 9 is 1001 in binary and 9 in hex)

9 . 6 x 16 (the 9 is 1001 in binary and 9 in hex)

9 . 6 x 16 (the 9 is 1001 in binary and 9 in hex)

9 . 6 x 16 (the 9 is 1001 in binary and 9 in hex)

(since the last multiplication leaves behind

0.6, the 9 should rounded to 10, or A)

Now the big construction begins:

0 011 1111 1010 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1010

Now group the bits into 4-bits for easy writing in hex notation:

0011 1111 1010 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1001 1010

That would be how the number 0.05 is stored in IEEE 64-bit floating point format. You could get this answer using the matlab command num2hex(0.05) and the answer is 3FA999999999999A.