Cryptographic Hash Functions

Kevin P. Dyer

What is a (cryptographic) hash function?

A hash function is a primitive used for many cryptographic applications to map an arbitrary length message to a fixed hash value. They are not meant to preserve data; it should be viewed as a method to create a fingerprint of a given message.

Some useful definitions and desired properties

A hash function is a mapping of the following:

hash : {0,1}* {0,1}n

(* Denotes arbitrary length bit-string, n is the fixed output size in bits)

A hash function is considered to be good if it has the following three properties: one-way, weakly collision free, and strongly collision free. The strongly collision free property implies the weakly collision free property and the one-way property. Let us define those three properties.

One-way property: Given z, one can not easily find x such that hash(x) = z. If we can find such an x, we will say that we have solved the one-way problem.

Weakly collision free property: Given x it is difficult to find x  y such that hash(x) = hash(y).

Strongly collision free property: It is difficult to find any x, y with x  y such that hash(x) = hash(y).

Dependent upon the application, one or all of these may be necessary to ensure desired security.

What is the current state of hash functions?

Most commonly, MD5 and SHA-1 are implemented in protocols in use today. As of the late 90’s many experts advised developers to move away from MD5 because certain weaknesses were found and it is considered broken. All functions within the MD4-family have been shown to not be collision resistant, but there are no know attacks against the one-way property.

As of today, NIST recommends migration from SHA-1 by 2010. In the interim, they are having public dialogue about the new standard for hash functions. This process is very similar to the one for AES in 2000-2001.

What does “broken” actually mean and how does it affect privacy?

When a hash function is “broken” it is not necessarily a security concern. For example, MD5 has been shown to not be collision resistant. For an application such as password storage the only necessary attribute is the one-way property. Since MD5 holds the one-way property, it is still secure for this particular application. Furthermore, some functions are only theoretically broken, they don’t hold properties of a hash function of the desired output domain. Simply stated: given the output size, we could have better security.

What is the current state of research?

Results published during the last decade provided important information about the possible exploitation of current standards. The predominant focus of experts has been the relation between addition over Z2^32 and F2. This is now very well understood and defined. Most recently discovered weaknesses lie within the actual methods of a given algorithm. Many hash functions have been created in an ad-hoc manner, especially functions within the MD4 family. Typically a standard is published and is considered secure until a weakness is found. The current call for a new standard is attempting to establish new and more theoretically secure methods for generating a hash. The use of number theory and a rock-solid mathematical underlying is the key to the next generation of hash functions, in lieu of simply augmenting the existing standards. This is something that may not be possible in the near future but it is a desired long-term goal for the cryptographic community.