ŰŰCyclic Redundancy Check

Error detection is important whenever there is a non-zero chance of your data getting corrupted. Whether it's an Ethernet packet or a file under the control of your application, you can add a piece of redundant information to validate it.

The simplest example is a parity bit. Many computers use one parity bit per byte of memory. Every time the byte gets written, the computer counts the number of non-zero bits in it. If the number is even, it sets the ninth parity bit, otherwise it clears it. When reading the byte, the computer counts the number of non-zero bits in the byte, plus the parity bit. If any of the nine bits is flipped, the sum will be odd and the computer will halt with a memory error. (Of course, if two bits are flipped--a much rarer occurrence--this system will not detect it.)

For messages longer than one byte, you'd like to store more than one bit of redundant information. You might, for instance, calculate a checksum. Just add together all the bytes in the message and append (or store somewhere else) the sum. Usually the sum is truncated to, say, 32 bits. This system will detect many types of corruption with a reasonable probability. It will, however, fail badly when the message is modified by inverting or swapping groups of bytes. Also, it will fail when you add or remove null bytes.

Calculating a Cyclic Redundancy Check is a much more robust error checking algorithm. In this article I will sketch the mathematical foundations of the CRC calculation and describe two C++ implementations--first the slow but simple one, then the more optimized one.

Polynomials

Here's a simple polynomial, 2x2 - 3x + 7. It is a function of some variable x, which depends only on powers of x. The degree of a polynomial is equal to the highest power of x in it; here it is 2 because of the x2 term. A polynomial is fully specified by listing its coefficients, in this case (2, -3, 7). Notice that to define a degree-d polynomial you have to specify d + 1 coefficients.

It's easy to multiply polynomials. For instance,

(2x2 - 3x + 7) * (x + 2)
= 2x3 + 4x2 - 3x2 - 6x + 7x + 14
= 2x3 + x2 + x + 14

Conversely, it is also possible to divide polynomials. For instance, the above equation can be rewritten as a division:

(2x3 + x2 + x + 14) / (x + 2) = 2x2 - 3x + 7

Just like in integer arithmetic, one polynomial doesn't have to be divisible by another. But you can always divide out the "whole" part and be left with the remainder. For instance x2 - 2x is not divisible by x + 1, but you can calculate the quotient to be x - 3 and the remainder to be 3:

(x2 - 2x) = (x + 1) * (x - 3) + 3

In fact you can use a version of long division to perform such calculations

Arithmetic Modulo Two

Most of us are familiar with polynomials whose coefficients are real numbers. In general, however, you can define polynomials with coefficients taken from arbitrary sets. One such set (in fact a field) consists of the numbers 0 and 1 with arithmetic defined modulo 2. It means that you perform arithmetic as usual, but if you get something greater than 1 you keep only its remainder after division by 2. In particular, if you get 2, you keep 0. Here's the addition table:
0 + 0 = 0
0 + 1 = 1 + 0 = 1
1 + 1 = 0 (because 2 has remainder 0 after dividing by 2)

The multiplication table is equally simple:

0 * 0 = 0
0 * 1 = 1 * 0 = 0
1 * 1 = 1

What's more, subtraction is also well defined (in fact the subtraction table is identical to the addition table) and so is division (except for division by zero). What is nice, from the point of view of computer programming, is that both addition and subtraction modulo 2 are equivalent to bitwise exclusive or (XOR).

Now imagine a polynomial whose coefficients are zeros and ones, with the rule that all arithmetic on these coefficients is performed modulo 2. You can add, subtract, multiply and divide such polynomials (they form a ring). For instance, let's do some easy multiplication:

(1x2 + 0x + 1) * (1x + 1)
= 1x3 + 1x2 + 0x2 + 0x + 1x + 1
= 1x3 + 1x2 + 1x + 1

Let's now simplify our notation by representing a polynomial as a series of coefficients. For instance, 1x2 + 0x + 1 has coefficients (1, 0, 1), 1x + 1 (1, 1), and 1x3 + 1x2 + 1x + 1 (1, 1, 1, 1).

Do you see what I am driving at? A polynomial with coefficients modulo 2 can be represented as a series of bits. Conversely, any series of bits can be looked upon as a polynomial. In particular any binary message, which is nothing but a series of bits, is equivalent to a polynomial.

CRC

Take a binary message and convert it to a polynomial then divide it by another predefined polynomial called the key. The remainder from this division is the CRC. Now transmit both the message and the CRC. The recipient of the transmission does the same operation (divides the message by the same key) and compares his CRC with yours. If they differ, the message must have been mangled. If, on the other hand, they are equal, the odds are pretty good that the message went through uncorrupted. Most localized corruptions (burst of errors) will be caught using this scheme.

Not all keys are equally good. The longer the key, the better error checking. On the other hand, the calculations with long keys can get pretty involved. Ethernet packets use a 32-bit CRC corresponding to degree-31 remainder (remember, you need d + 1 coefficients for a degree-d polynomial). Since the degree of the remainder is always less than the degree of the divisor, the Ethernet key must be a polynomial of degree 32. A polynomial of degree 32 has 33 coefficients requiring a 33-bit number to store it. However, since we know that the highest coefficient (in front of x32) is 1, we don't have to store it. The key used by the Ethernet is 0x04c11db7. It corresponds to the polynomial:

x32 + x26 + ... + x2 + x + 1

There is one more trick used in packaging CRCs. First calculate the CRC for a message to which you have appended 32 zero bits. Suppose that the message had N bits, thus corresponding to degree N-1 polynomial. After appending 32 bits, it will correspond to a degree N + 31 polynomial. The top-level bit that was multiplying xN-1 will be now multiplying xN+31 and so on. In all, this operation is equivalent to multiplying the message polynomial by x32. If we denote the original message polynomial by M (x), the key polynomial by K (x) and the CRC by R (x) (remainder) we have:

M(x) * x32 = Q (x) * K (x) + R (x)

Now add the CRC to the augmented message and send it away. When the recipient calculates the CRC for this sum, and there was no transmission error, he will get zero. That's because:

M(x) * x32 + R (x) = Q (x) * K (x) (no remainder!)

You might think I made a sign mistake--it should be -R (x) on the left. Remember, however, that in arithmetic modulo 2 addition and subtraction are the same!

We'll use this property of the CRC to test our implementation of the algorithm.

ŰŰNaive CRC Calculation

The CRC algorithm requires the division of the message polynomial by the key polynomial. The straightforward implementation follows the idea of long division, except that it's much simpler. The coefficients of our polynomials are ones and zeros. We start with the leftmost coefficient (leftmost bit of the message). If it's zero, we move to the next coefficient. If it's one, we subtract the divisor. Except that subtraction modulo 2 is equivalent to exclusive or, so it's very simple.

Let's do a simple example, dividing a message 100110 by the key 101. Remember that the corresponding polynomials are x5 + x2 + x and x2 + 1. Since the degree of the key is 2, we start by appending two zeros to our message.

10011000 / 101

101

100

101

01

We don't even bother calculating the quotient, all we need is the remainder (the CRC), which is 01 in this case. The original message with the CRC attached reads 10011001. You can easily convince itself that it is divisible by the key, 101, with no remainder.

In practice we don't write the top bit of the key--it is implicit. In this particular example, we would only store bits 01 as our key.

The calculation above could be implemented using a 2-bit register for storing intermediate results (again, the top bit is always one, so we don't store it). Let's rewrite the above example and highlight the bits that are stored in the register at each step. The significant bits of the key are marked in red.

0010011000 / 101

001

010

100

101

010

100

101

001

Notice that we subtract (or XOR, since this is arithmetic modulo 2) the key from the register every time a 1 is shifted out of it.

Implementation

Let's start with the basic class, Crc. It defines the type Crc::Type as a 32-bit unsigned long (corresponding to a 33-bit key). The constructor takes the key and stores it, and it zeroes the register. The method Done returns the result of the CRC calculation. It also zeroes the register, so that it can be used to calculate another CRC.

#include <cassert>

#include <iostream>

#include <string>

class Crc

{

public:

typedef unsigned long Type;

Crc (Type key)

: _key (key), _register (0)

{}

Type Done ()

{

Type tmp = _register;

_register = 0;

return tmp;

}

protected:

Type _key; // really 33-bit key, counting implicit 1 top-bit

Type _register;

};

The straightforward implementation of the CRC algorithm is not very efficient, but it will serve as our starting point. The class SlowCrc implements a public method PutByte as well as a private helper, PutBit. PutByte simply splits the byte into bits and sends them one-by-one to PutBit. The value of the bit is encoded as a bool (true or false).

class SlowCrc: public Crc

{

public:

SlowCrc (Crc::Type key)

: Crc (key)

{}

void PutByte (unsigned char byte);

private:

void PutBit (bool bit);

};

void SlowCrc::PutByte (unsigned char byte)

{

unsigned char mask = 0x80; // leftmost bit

for (int j = 0; j < 8; ++j)

{

PutBit ((byte & mask) != 0);

mask >= 1;

}

}

Here is the heart of the algorithm, PutBit. We pick the top bit from the register, shift the register left by one, inserting a new message bit from the right. If the top bit was one, we XOR the key into the register.

void SlowCrc::PutBit (bool bit)

{

std::cout < bit? "1": "0";

bool topBit = (_register & 0x80000000) != 0;

// shift bit into register from the right

_register <= 1;

_register ^= (bit? 0x1: 0x0); // OR or XOR, same result

if (topBit)

{

// XOR the 32-bits of the key.

// The implicit high bit of the 33-bit key conceptually

// clears the topBit shifted out of the register

_register ^= _key;

}

}

Here is the main routine for testing the algorithm. First we extend the original message by 32 bits. This corresponds to multiplying the original polynomial by x32. Next we calculate the CRC for such an augmented message and add it to it. That corresponds to calculating M (x) * x32 + R (x), where M (x) is the message polynomial and R (x) is the remainder from our division--in other words the CRC.

As a consistency test, we divide this new polynomial (the message with the CRC appended to it) by the key polynomial and make sure we get zero.

int main ()

{

Crc::Type const ethernetKey = 0x04c11db7;

SlowCrc slowCrc (ethernetKey);

// calculate R in: M (x) * x^32 = Q (x) * K (x) + R (x)

std::string msg ("Harry had a little lamp");

size_t origLen = msg.length ();

// "Multiply" message by x^32

msg.resize (origLen + sizeof (Crc::Type));

for (size_t i = 0; i < msg.length (); ++i)

{

std::cout < " ";

slowCrc.PutByte (msg [i]);

}

std::cout < "\n<- ";

Crc::Type crc = slowCrc.Done ();

std::cout < "\n0x" < std::hex < crc < std::endl;

// Now test consistency

// M (x) * x^32 + R (x)

int shift = 32;

for (int j = 0; j < sizeof (SlowCrc::Type); ++j)

{

shift -= 8;

msg [origLen + j]

= static_cast<unsigned char> (crc > shift);

}

// Divide it by K (x) --> should be divisible

for (i = 0; i < msg.length (); ++i)

{

std::cout < " ";

slowCrc.PutByte (msg [i]);

}

std::cout < "\n<- ";

crc = slowCrc.Done ();

std::cout < "\n0x" < std::hex < crc < std::endl;

assert (crc == 0);

return 0;

}

Next, we'll implement the optimized version.

ŰŰOptimized CRC Calculation

The main trick in optimizing the CRC calculation is to operate on bytes rather than bits. Look at the top 8 bits of the CRC register. Scan this byte from left to right. Every time you find a non-zero bit, you XOR the key into the register at a bit offset such that this bit is turned off by the top invisible bit of the key. (Notice that this operation can modify the remaining bits in the byte you are scanning.) XORing is like addition, it is associative and you can do it in any order. So you can XOR together a bunch of shifted keys corresponding to a given byte and store it for later. This is simply a section of the slow CRC algorithm applied to our pre-calculation:

// for all possible byte values

for (unsigned i = 0; i < 256; ++i)

{

// put this byte at the top of the register

Crc::Type reg = i < 24;

// for all bits in a byte

for (int j = 0; j < 8; ++j)

{

bool topBit = (reg & 0x80000000) != 0;

reg <= 1;

if (topBit)

reg ^= _key;

}

_table [i] = reg;

}

Once the table is pre-calculated, we can use it to evaluate the CRC of any message byte-by-byte. At each step, we shift one byte out of the register and shift in the next message byte. We use the shifted byte as an index to our pre-computed table. The value from the table contains the accumulated XOR of appropriately shifted keys. We XOR this value into the register and continue the process until the whole message is consumed.

In fact we can do even better. We can keep XORing the accumulated keys into the register, but postpone the XORing of the message byte until its time comes to be shifted out of the register. Imagine data flowing into the register and out of it.

As a bonus, we won't be wasting the first four cycles on pushing the null bytes through the register and we won't need to append four null bytes to the end of the message.

Optimized Implementation

The class FastCrc contains a static table with pre-computed values. This table is actually stored inside a class Table that is private to the class FastCrc. Notice the overloaded operator [] which makes Table behave like an array.

class FastCrc: public Crc

{

public:

FastCrc (Crc::Type key)

: Crc (key)

{

_table.Init (key);

}

void PutByte (unsigned byte);

private:

class Table

{

public:

Table () : _key (0) {}

void Init (Crc::Type key);

Crc::Type operator [] (unsigned i)

{

return _table [i];

}

private:

Crc::Type _table [256];

Crc::Type _key;

};

private:

static Table _table;

};

The implementation file must contain the definition of the static member _table

FastCrc::Table FastCrc::_table;

The table is initialized when the first FastCrc object is created and then reinitialized only if the key changes.

void FastCrc::Table::Init (Crc::Type key)

{

assert (key != 0);

if (key == _key)

return;

_key = key;

// for all possible byte values

for (unsigned i = 0; i < 256; ++i)

{

Crc::Type reg = i < 24;

// for all bits in a byte

for (int j = 0; j < 8; ++j)

{

bool topBit = (reg & 0x80000000) != 0;

reg <= 1;

if (topBit)

reg ^= _key;

}

_table [i] = reg;

}

}

Look at the simplicity of the PutByte method:

void FastCrc::PutByte (unsigned byte)

{

unsigned top = _register > 24;

top ^= byte;

_register = (_register < 8) ^ _table [top];

}

The message byte is XORed only after the top byte is shifted out of the register. A new stack of keys obtained from the table is then XORed into the register.

To test our implementation, we will calculate the CRC using both the slow and the fast algorithms.

int main ()

{

std::string msg ("Harry had a little lamp");

std::string exMsg (msg);

exMsg.resize (msg.length () + 4);

Crc::Type const ethernetKey = 0x04c11db7;

SlowCrc slowCrc (ethernetKey);

for (size_t i = 0; i < exMsg.length (); ++i)

{

slowCrc.PutByte (exMsg [i]);

}

Crc::Type crc = slowCrc.Done ();

std::cout < "\n0x" < std::hex < crc < std::endl;

FastCrc fastCrc (ethernetKey);

for (size_t j = 0; j < msg.length (); ++j)

{

fastCrc.PutByte (msg [j]);

}

Crc::Type newCrc = fastCrc.Done ();

std::cout < "\n0x" < std::hex < newCrc < std::endl;

return 0;

}