Steganography Is the Science of Hiding Information in Data

AMLAN CHAKRABORTY

0440954

Steganography and Digital Watermarking -- Applications, Attacks and Countermeasures

Introduction

Steganography is the science of hiding information in data. Normally steganography is done intelligently such that it is difficult for an adversary to detect the existence of a hidden message in the otherwise innocuous data. The piece of data that has the message embedded in it is visible to the world in the clear and appears as harmless and normal. This is in stark contrast with cryptography where the message is scrambled to make it extremely difficult or impossible for an adversary to put together. A message in ciphertext arouses some sort of suspicion whereas invisible message embedded in clear text does not. This is the advantage of steganography.

Generally, a steganographic message will appear to be something else: a picture, an audio file, a video file or a message in clear text - the covertext. Historically, messages were written using hidden invisible ink between the visible lines of innocuous documents, or even written onto clothing. Other techniques used were writing messages in Morse code in knitting yarn, or marking particular words or letters in the message, using invisible ink or pin prick that form the secret message. During WWII Germans used the microdot technology, where an image the size of a period had the clarity of typewritten pages. In this case the period was the covertext and the image is the message. Though smart hiding and innocuous hiding techniques are used to hide the stegotext, the algorithm itself is secure and only known to the communicating parties and not to the world. This is in slight contrast to classical cryptography where the algorithm is well known and only the key(s) are secret. Though data is not encrypted in steganography, authenticity of a message is normally established by using a MAC or a signature.

Steganography can be used to code messages in any transport layer – an image (GIF/BMP/JPEG), a MP3 file, a communications protocol like UDP etc. Steganogrpahic information can also be added to richer multimedia content like DVDs. There are normally two motivations – to send a secret message or to establish authenticity of a piece of information – usually a multimedia file. The later is a major application of modern steganography and known as Digital Watermarking and Fingerprinting. Watermarks establish ownership of an artifact while fingerprints or labels help to identify intellectual property violators. They are different protocol implementation of the same basic idea.

Information theory and human sensory perception

Steganography is possible for the same reasons that compression is – a combination of information theory and human perception of vision and audio. Digital signal contains redundancy which manifests itself as noise. Humans cannot detect all levels of noise; in other words, humans often cannot tell an image or an audio clip from another with slight difference in levels of noise. The larger the cover message is (in data content terms — number of bits) relative to the hidden message, the easier it is to hide the latter. For example, a 24 bit bitmap image has 8 bits representing three colors – Red, green and blue at each pixel – 256 shades of each basic color. So changing the least significant bit of any of these basic colors would make an extremely negligible change on that pixel – and possibly less on the image. So the least significant bit can be easily used to store the steganographic message. So, if we change the LSB of each basic color of three adjacent pixels, we get 9 bits -- enough space to store an ASCII character. This is called LSB manipulation and a very conventional and simple steganographic implementation. It can also be noted that the actual message itself can be compressed using some compression coding methodologies like run length coding. Historically, a lot of invisible ink steganographic messages were encoded using Polybius squares or similar text to integer mapping schemes.

Stated somewhat more formally, the objective for making steganographic encoding difficult to detect is to ensure that the changes to the carrier/container (the original signal) due to the injection of the payload (the signal to covertly embed) are visually and ideally, statistically negligible; that is to say, the changes are indistinguishable from the Gaussian noise of the carrier. From an information theoretical point of view, this means that the channel must have more capacity than the 'surface' signal requires (entropy), i.e., there is redundancy. For a digital image, this may be noise from the imaging element; for digital audio, it may be noise from recording techniques – amplitude or frequency modulation. Any system with an analog (signal) amplification stage will also introduce thermal noise, which can be exploited as a noise cover. Steganographic channel is a covert channel in Information theory terms since it transfers some kind of information using a method originally not intended to transfer this kind of information. Steganography also supports both storage and timing covert channels. This report primarily discusses storage covert channels where a covert message is communicated by manipulating a stored object like an image. Ron Rivest’s “Chaffing and Winnowing” protocol discussed later can be argued as an example of timing covert channel.

It is fairly obvious that more the data content of the cover message, the easier it is to hide the message. In case of images, bitmaps are better fits that GIFs and JPEGs because GIF is 8 bits per pixel and JPEG is a lossy compression technique. But on the flipside, bigger images will attract more attention than smaller images as suspect stego-images. Subtlety in changes is a very important feature and stego-images should only have subtle changes. An image with large areas of solid colors would be a bad fit since large variances created by the embedded message would cause drastic differences easily spotted by the human eye. The spatial frequency distribution of the image (spatio- temporal in case of audio or video content) is also a determining factor in the efficiency of the hiding process. As we will see later, we have techniques for both Gaussian and LaPlacian distribution using maximum likelihood estimators for the stego-messages.

Often the embedded message is itself encrypted using a key that may or may not be known to the adversary. Since steganography requires that communicating parties have some prior shared information, symmetric key is a natural fit. However, public steganography with steganographic key exchanges is also possible.

Prisoner’s problem and subliminal channel

The study of steganography in machine cryptography was first stated in the prisoner’s problem by Simmons. Two inmates Alice and Bob are accomplices in a crime and are sent to the prison. They need to communicate with each other but they have to use a public channel which is monitored by the Warden of the jail. The warden will only forward the messages if they are intelligible. The prisoners accept this condition and find a way to communicate secretly in exchanges --- establishing a subliminal channel even though the messages themselves are not encrypted. The warden will also try to deceive them, so they will authenticate each other’s messages before accepting them – authentication without secrecy.

Thethe situation is paradoxical because the warden demands access and the prisoner’s need to authenticate each other. Authentication without secrecy channels achieve that by placing a pre arranged condition on all messages. It is this capability that creates a subliminal channel for the prisoners. If ‘m’ redundant bits are allowed to establish authenticity, then these redundant bits create a bit by bit subliminal channel which can be used to transmit extra information.

Null ciphers and “Chaffing and Winnowing”

A null cipher is a form of encryption where the plaintext is mixed with a large amount of non-cipher material. Null ciphers are used to hide the actual ciphertext by introducing nulls to confuse the cryptanalyst. Classical steganography can also be thought of as an extension of this concept where the carrier / container data are actually the null ciphers – data that create confusion and diffuse the actual payload.

Ron Rivest extended this concept to an idea of “Chaffing and Winnowing” to create steganographic communication channels. The concept is analogous to separating (winnowing) wheat from chaff where wheat is the actual payload and chaff is the null ciphers. In a two step process, the transmitter introduces chaff to the wheat i.e. intersperse the actual payload with meaningless data. The receiver “winnows” the actual payload from the non-interesting data. As with most steganographic transfers, the transmitters add a MAC to establish authenticity of the communication to any message that is sent. MACs are calculated over the entire message and a serial number of the message using a secret symmetric authentication key. The transmitter attaches bogus MACs for the chaff packets instead of calculating it. This is what distinguishes the “chaff” from the “wheat”. The receiver now doesn’t have to do anything special since the normal protocol of a receiver is to discard packets that do not have correct MACs. Though the adversary can see the entire communication, it cannot tell chaff from wheat as the MAC will look like a random function. However, weak MAC functions can potentially leak information in this protocol. It is also important to note that it is not possible to use digital signatures here since anyone will be then able to compare the signatures and tell “chaff” from “wheat”. However, “designated verifier signature” schemes where only signature designates can verify a signature would work fine. The other key idea is that since the creation of “chaff” involves generation of a bad MAC and not the knowledge of a secret key, any entity can play the role of a “chaffer”.

Digital Watermarking and Fingerprinting

Digital watermarking is the technique of adding identifying information to digital artifacts using steganographic principles i.e. hiding the information cleverly so that extraction is difficult by any adversary. Watermarks can be visible or invisible in the context of images. There are various techniques of placing digital watermarks on images but they can conceptually be divided into two categories –

Spatial techniques. These methods are based on hiding the messages on geometric characteristics of the image. These are highly susceptible to signal alteration algorithms. Even simple signal manipulation like zooming, cropping, smoothing would obliterate watermarks.
Frequency Domain techniques. These methods are used to hide messages along the frequency distribution of hues, intensities, luminance etc of the images. These are comparatively robust to simple image manipulations but can fall prey to statistical steganalysis.

In strict terms, visible digital watermarks are really not steganographic object – they enhance information instead of hiding.

Fingerprinting is a slight different implementation of digital watermarks. When an artifact is sold to an entity, information about that entity is hidden in the artifact. If illegitimate copies of the artifact are sold, the watermark information would reveal the violator. A slight modification of this would be using the canary trap protocol where unique alterations are made to each copy of artifact sold. The illegitimate copy has a tell-a-tale that traces back to the violator.

Some digital watermarking algorithms

It is not too difficult to formulate algorithms that can cleverly hide information in images. The key idea to avoid detection is to hide the message in such a way that statistically it comes across like normal distribution making pattern detection very difficult.

Masking and filtering

These are some basic techniques to create visible watermarks by altering the luminance or colors of certain regions in the image. These can be detected very easily by simple statistical analysis but these are fairly resistant to lossy compression and image cropping. It doesn’t hide the data in noise but embed it in significant areas – just the reverse of LSB manipulation.

LSB Manipulation

This is the manipulation described in the Introduction that is susceptible to even slight image modification. It is very efficient in hiding a GIF or BMP image in another but a linear analysis is enough to figure this out. It is fairly easy to hide an image in 3 or even 4 least significant bits of another image without causing major noticeable change. The motivation for steganography is important here. If the intention is to covertly pass messages, this can still work unless all artifacts are sniffed for steganographic information. But if this is meant for digital watermarks, it is very easy to extract and /or get rid of the info.

Spread Spectrum methods

In spread spectrum methods, the message is scattered across the image making it harder for cropping, rotation and other basic image manipulation techniques to obliterate the watermark. This is also somewhat resistant to statistical steganalysis because it gives it the impression of noise in an image. Patchwork is a tool from IBM uses this technique to scatter hidden information based on statistical distribution of luminance in the image. It iteratively selects two patches on the image, brightens one and darkens one. It then calculates the standard deviation, S between light and dark patches over the sample patches. To encode, it picks up two patches up in random and then brightens one by S and darkens one by S. This process is iterated and the whole image palette is laid in a mosaic of bright and dark patches one of which is used to hide data. This patch information is vital to decode the hidden message later. This is clearly a frequency distribution method. Patchwork makes the assumption that the image has a Gaussian distribution.

Texture Block coding

In this method, pairs of areas of similar texture are found and one area is copied over the other. Thus we have identical blocks of texture in the image. Iterating a few times, we can get two large blocks of identical textures. These two blocks would get altered identically for all non-geometric alterations of the image. These two blocks can then contain information about these images.

M-Sequences using linear shift registers

M-sequences are based on starting vectors of a Fibonacci recursion relation which form a Galois field of finite cardinality. Mathematically and statistically these numbers are known to have desirable autocorrelation functions; the distribution of Galois field numbers is known to be of normal distribution thus resembling Gaussian noise in an image. So images encoded using m-sequences are statistically impossible to distinguish from the original as they are similar to noise in a normal distribution. If the stego message is encoded using m-sequences, it can easily be embedded in the image by a LSB substitution. A more secure implementation would be to use LSB addition instead to embed the watermark. So it will require the examination of the complete bit pattern and the current linear shift register implementation. This is more secure because to crack this, the adversary would have to do the same computations without any apriori knowledge.

Frequency hopping

In this method scattering of the message is done on the basis of rules that change cumulatively. The idea is similar to DES block encryption; bits are swapped according to rules that are dictated by the stego-key and random data from the previous round. White noise storm, an implementation of this methodology, creates a message space of 8 channels where each channel has a window of W bytes, where W is a random number. Each channel however carry only one bit of the message and a lot of unused bits. The bits inside a window permutate and rotate according to an algorithm that is regulated by the previous window’s operations and the stego-key. Finally this encoded message is embedded in the image using LSB substitution. The idea again is to simulate a distribution that is similar to a Gaussian distribution.

Steganalysis and Digital Watermarking Attacks

Steganalysis is analogous to cryptanalysis in the context of steganography. Steganalysis is composed of three steps:-

Detection of hidden message (Passive Steganalysis)
Extracting of hidden message (Active Steganalysis)
Disabling/ Destruction of hidden message.

It is important to note here that it is not necessary to extract a message to disable or destruct a message. It is often very difficult to extract a hidden message and at times even to detect one because they are scattered and show up as noise. The case of visible watermarks is obviously different. But the problem lies in the fact – detection is also not important if we have a “suspicious attitude”. We can run algorithms that are known to destruct digital watermarks in messages. On top of that there are algorithms that instead of disabling watermarks, either overwrite watermarks or create exact replicas – rendering the watermark useless either way. Luis Von Ahn et al formulates and proposed “universal robustness” for steganographic information. They prove that “robust steganography” is as secure as the underlying crypto used to encrypt the message that is hidden in the clear. But this just ensures extraction is hard and likens it to cryptography. But their algorithm doesn’t prove that obliteration of the steganographic secret is not possible.