Numerical Error: Atkinson defines absolute error as follows -

AE = error = xt - xa,

where xa approximates some exact value xt, and the relative error as

RE = relative error = (xt - xa)/xt.

We will also have occasion to use these logarithmic measures of error:

d-acc = “decimal places of accuracy”= - log10|error|

-acc = “digits of accuracy” = - log10|RE|

Let’s try and justify the d-acc & -acc labeling...

Suppose xa is accurate to at least the dth decimal place (i.e. |AE|  10-d),

where d is an integer. Then d-acc  d, and conversely. In particular, since

d-acc  [d-acc], with [ ] denoting the ‘greatest integer function’, it holds

that xa is accurate to at least [d-acc] decimal places.

If |AE|  10-d, then since |xt|  10e, |RE|  10-(e+d)-acc  e+d.

Here, we’re defining xa to have at least k digits of accuracy*with respect to xt

if the magnitude of the error, |AE|, is less than or equal to 1 unit in the (k+1)st

digitcounting rightward from xt’s first nonzero digit. More precisely,

|AE|  10e-k, for e satisfying |xt| = 10e(d1.d2d3...dk...) with d1 1.

Note: This condition also implies |RE|  10-kso that -acc  k.

*This is more stringent than Atkinson’s definition, i.e.1 unit vs. 5 units.

Claim: If -acc , then xa has at least-1 digits of accuracy wrt xt.

Proof: Suppose that -acc . Then -log10|RE|  |RE|  10-

|AE|  |xt| 10-< 10e-(-1) andso xa has the claimed accuracy.

Note: We can’t do better than - 1digits. Try xt = 5.000 and xa = 4.995.

Then |AE| = .5 10-2and-acc =  = 3, but |AE|10-3 fails to hold.

Claim: Suppose xa is accurate wrt xt to at least the (st place counting

rightward from xt’s first nonzero digit, i.e. assume |AE|  10e-,

where |xt| = (d1.d2d3...) 10e with d11. Then -acc .

Proof: Since |RE| = |AE|/|xt| 10-, this follows immediately.

Atkinson’s #7 on pg. 43, but using the round-to-even rule & an arbitrary :

Claim: For x in the range of the arithmetic, that is if ßL |x|  ßU( – ß1-n),

|(x - fl(x))/x|  .5ß1-n/(1 + .5ß1-n) < .5ß1-n,

where “fl” is the round-to-even floating point operation.

Proof: Letx = sgn(x) ße (d1.d2d3...dndn+1...)ß,z = (d1.d2d3...dn)ß and

y = (.dn+1dn+2dn+3…)ß.

Then we know that fl(x) (and note that fl(-x) = -fl(x)) is either...

(1) sgn(x) ßez + sgn(x) ße ß1-n if y > .5 or if y = .5 & dn is odd,

or

(2) sgn(x) ßez if y < .5 or if y = .5 and dn is even.

Therefore,

(1)  |x - fl(x)| = ßeß1-n(1 - y) and (2) |x - fl(x)| = ße ß1-ny.

Accordingly, in case (1),

|(x - fl(x))/x| ße ß1-n(1 - y)/(ße(1 + ß1-ny))

 .5 ß1-n/(1 + .5 ß1-n)

< .5 ß1-n,

while in case (2),

|(x - fl(x))/x| ß1-ny/(1 + ß1-ny)  .5ß1-n/(1 + .5ß1-n) < .5 ß1-n.

*Since g(y) = ß1-n y/(1 + ß1-n y) is strictly increasing for y > 0.

Rounding-up-fives (aka simple rounding) yields the same inequality while

chopping yields |x - fl(x)| = ße ß1-n y < ße+1-n therefore |(x - fl(x))/x| < ß1-n.

Regarding Addition: Suppose we want to compute S = 1in ai, where we

can ignore the input errors occurring when the ai's are floated, but not the

errors associated with the intermediate sums, i.e. the values

a1 + a2,a1 + a2 + a3, a1 + a2 + a3 + ... + an.

Set S2 = fl(a1 + a2) = (a1 + a2)(1 + 2), S3 = fl(S2 + a3) = (S2 + a3)(1 + 3),...,

Sn = fl(Sn-1 + an) = (Sn-1 + an)(1 + n). Then since

Sn = (a1 + a2) 2in (1 + i) + a33in (1 + i) + a44in (1 + i) +

…+ an-1 (1 + n-1) (1 + n) + an (1 + n)

= (a1 + a2) (1 + 2ini+ r2) + a3 (1 + 3ini + r3) +

…+ an-1 (1 + n-1+ n+rn) + an (1 + n),

where r2, r3, r4,…, are higher power expressions in 2, 3, …, it follows

thatif these expressions are ignored,Sn– S is approximately equal to

(a1 + a2) 2ini + a33ini +… + an-1(n-1+ n) + ann.

This suggests thatwe might minimizethe error if we first sort the ai's from

least to greatest in magnitude, i.e. |a1|  |a2|  |a3|  . . .  |an|, before adding.

Example: Add 1.21, 10.4, 30.4, 50.6, & 70.1. The exact value is 162.71,

but using 3-digit arithmetic with a round-to-even, yields the following -

least to greatest: 163. vs. greatest to least: 162.

rel. error = .00178... rel. error = .00436...