Department of Computer Science
Lecture notes:
image COMPRESSION
Pasi Fränti
Abstract: The course introduces to image compression methods in the case of binary, gray scale, color palette, true color and video images. Topics include Huffman coding, arithmetic coding, Colomb coding, run-length modeling, predictive modeling, statistical modeling, context modeling, progressive image decompression, transform-based modeling, wavelets, vector quantization, and fractal-based compression. Existing and forth-coming standards such as JBIG1, JBIG2, JPEG, JPEG-LS and JPEG-2000 will be covered. Main emphasize is on the compression algorithms in these standards.
Joensuu
9.9.2002
Table of contents:
IMAGE COMPRESSION
"A picture takes more than thousand bytes"
1. Introduction
1.1 Image types
1.2 Lossy versus lossless compression
1.3 Performance criteria in image compression
2 Fundamentals in data compression
2.1 Modelling
2.2 Coding
3 Binary images
3.1 Run-length coding
3.2 READ code
3.3 CCITT group 3 and group 4 standards
3.4 Block coding
3.5 JBIG
3.6 JBIG2
3.7 Summary of binary image compression algorithms
4 Continuous tone images
4.1 Lossless and near-lossless compression
4.2 Block truncation coding
4.3 Vector quantization
4.4 JPEG
4.5 Wavelet
4.6 Fractal coding
5 Video images
Literature
Appendix A: CCITT test images
Appendix B: Gray-scale test images
1. Introduction
The purpose of compression is to code the image data into a compact form, minimizing both the number of bits in the representation, and the distortion caused by the compression. The importance of image compression is emphasized by the huge amount of data in raster images: a typical gray-scale image of 512´512 pixels, each represented by 8 bits, contains 256 kilobytes of data. With the color information, the number of bytes is tripled. If we talk about video images of 25 frames per second, even a one second of color film requires approximately 19 megabytes of memory, therefore the capacity of atypical hard disk of aPCmachine (540MB) can store only about 30 seconds of film. Thus, the necessity for compression is obvious.
There exists a number of universal data compression algorithms that can compress almost any kind of data, of which the best known are the family of Ziv-Lempel algorithms. These methods are lossless in the sense that they retain all the information of the compressed data. However, they do not take advantage of the 2dimensional nature of the image data. Moreover, only a small portion of the data can be saved by a lossless compression method, and thus lossy methods are more widely used in image compression. The use of lossy compression is always a trade-off between the bit rate and the image quality.
1.1 Image types
From the compression point of view, the images can be classified as follows:
· binary images
· gray-scale images
· color images
· video images
An illustration of this classification is given in Figure 1.1. Note that the groups of gray-scale, color, and video images are closely related to each other, but there is a gap between the binary images and the gray-scale images. This demonstrates the separation of the compression algorithms. The methods that are designed for the grayscale images, can also be applied to the color and video images. However, they usually do not apply to the binary images, which are a distinct class of images from this point of view.
For comparison, Figure 1.1 also shows the class of textual data. The fundamental difference between images and e.g. English text is the 2dimensionality of the image data. Another important property is that the gray-scales are countable, which is not true for English text. It is not evident that any subsequent symbols, e.g. 'a' and 'b' are close to each other, whereas the gray-scales 41 and 42 are. These properties distinct the image data from other data like English text.
Note also that the class of color-palette images appears on the border line of image data and the non-image data. This demonstrates the lack of countable alphabet of the colorpalette images, which makes them closer to other data. In fact, color-palette images are often compressed by the universal compression algorithms, see Section 4.7.
Figure 1.1: Classification of the images from the compression point of view.
1.2 Lossy versus lossless compression
A compression algorithm is lossless (or information preserving, or reversible) if the decompressed image is identical with the original. Respectively, a compression method is lossy (or irreversible) if the reconstructed image is only an approximation of the original one. The information preserving property is usually desirable, but not always obligatory for certain applications.
The motivation for lossy compression originates from the inability of the lossless algorithms to produce as low bit rates as desired. Figure 1.2 illustrates typical compression performances for different types of images and types of compression. As one can see from the example, the situation is significantly different with binary and gray-scale images. In binary image compression, very good compression results can be achieved without any loss in the image quality. On the other hand, the results for gray-scale images are much less satisfactory. This deficiency is emphasized because of the large amount of the original image data when compared to a binary image of equal resolution.
Figure 1.2: Example of typical compression performance.
The fundamental question of lossy compression techniques is where to lose information. The simplest answer is that information should be lost wherever the distortion is least. This depends on how we define distortion. We will return to this matter in mode detailed in Section 1.3.
The primary use of images is for human observation. Therefore it is possible to take advantage of the limitations of the human visual system and lose some information that is less visible to the human eye. On the other hand, the desired information in an image may not always be seen by human eye. To discover the essential information, an expert in the field and/or image processing and analysis may be needed, cf. medical applications.
In the definition of lossless compression, it is assumed that the original image is in digital form. However, one must always keep in mind that the actual source may be an analog view of the real world. Therefore the loss in the image quality already takes place in the image digitization, where the picture is converted from analog signal to digital. This can be performed by an image scanner, digital camera, or any other suitable technique.
The principal parameters of digitization are the sampling rate (or scanning resolution), and the accuracy of the representation (bits per sample). The resolution (relative to the viewing distance), on the other hand, is dependent on the purpose of the image. One may want to watch the image as an entity, but the observer may also want to enlarge (zoom) the image to see the details. The characteristics of the human eye cannot therefore be utilized, unless the application in question is definitely known.
Here we will ignore the digitization phase and make the assumption that the images are already stored in the digital form. These matters, however, should not be ignored when designing the entire image processing application. It is still worthwhile mentioning that while the lossy methods seem to be the main stream of research, there is still a need for lossless methods, especially in medical imaging and remote sensing (i.e.satellite imaging).
1.3 Performance criteria in image compression
The aim of image compression is to transform an image into compressed form so that the information content is preserved as much as possible. Compression efficiency is the principal parameter of a compression technique, but it is not sufficient by itself. It is simple to design a compression algorithm that achieves a low bit rate, but the challenge is how to preserve the quality of the reconstructed image at the same time. The two main criteria of measuring the performance of an image compression algorithm thus are compression efficiency and distortion caused by the compression algorithm. The standard way to measure them is to fix a certain bit rate and then compare the distortion caused by different methods.
The third feature of importance is the speed of the compression and decompression process. In online applications the waiting times of the user are often critical factors. In the extreme case, a compression algorithm is useless if its processing time causes an intolerable delay in the image processing application. In an image archiving system one can tolerate longer compression times if the compression can be done as a background task. However, fast decompression is usually desired.
Among other interesting features of the compression techniques we may mention the robustness against transmission errors, and memory requirements of the algorithm. The compressed image file is normally an object of a data transmission operation. The transmission is in the simplest form between internal memory and secondary storage but it can as well be between two remote sites via transmission lines. The data transmission systems commonly contain fault tolerant internal data formats so that this property is not always obligatory. The memory requirements are often of secondary importance, however, they may be a crucial factor in hardware implementations.
From the practical point of view the last but often not the least feature is complexity of the algorithm itself, i.e. the ease of implementation. Reliability of the software often highly depends on the complexity of the algorithm. Let us next examine how these criteria can be measured.
Compression efficiency
The most obvious measure of the compression efficiency is the bit rate, which gives the average number of bits per stored pixel of the image:
bit rate = (bits per pixel) (1.1)
where C is the number of bits in the compressed file, and N (=X×Y) is the number of pixels in the image. If the bit rate is very low, compression ratio might be a more practical measure:
compression ratio = (1.2)
where k is the number of bits per pixel in the original image. The overhead information (header) of the files is ignored here.
Distortion
Distortion measures can be divided into two categories: subjective and objective measures. Adistortion measure is said to be subjective, if the quality is evaluated by humans. The use of human analysts, however, is quite impractical and therefore rarely used. The weakest point of this method is the subjectivity at the first place. It is impossible to establish a single group of humans (preferably experts in the field) that everyone could consult to get a quality evaluation of their pictures. Moreover, the definition of distortion highly depends on the application, i.e. the best quality evaluation is not always made by people at all.
In the objective measures the distortion is calculated as the difference between the original and the reconstructed image by a predefined function. It is assumed that the original image is perfect. All changes are considered as occurrences of distortion, no matter how they appear to a human observer. The quantitative distortion of the reconstructed image is commonly measured by the mean absolute error (MAE), mean square error (MSE), and peak-to-peak signal to noise ratio (PSNR):
(1.3)
(1.4)
, assuming k=8. (1.5)
These measures are widely used in the literature. Unfortunately these measures do not always coincide with the evaluations of a human expert. The human eye, for example, does not observe small changes of intensity between individual pixels, but is sensitive to the changes in the average value and contrast in larger regions. Thus, one approach would be to calculate the mean values and variances of some small regions in the image, and then compare them between the original and the reconstructed image. Another deficiency of these distortion functions is that they measure only local, pixel-by-pixel differences, and do not consider global artifacts, like blockiness, blurring, or the jaggedness of the edges.
2 Fundamentals in data compression
Data compression can be seen consisting of two separated components: modelling and coding. The modelling in image compressions consists of the following issues:
· How, and in what order the image is processed?
· What are the symbols (pixels, blocks) to be coded?
· What is the statistical model of these symbols?
The coding consists merely on the selection of the code table; what codes will be assigned to the symbols to be coded. The code table should match the statistical model as well as possible to obtain the best possible compression. The key idea of the coding is to apply variable length codes so that more frequent symbols will be coded with less bits (per pixel) than the less frequent symbols. The only requirement of coding is that it is uniquely decodable; i.e. any two different input files must result to different code sequences.
A desirable (but not necessary) property of a code is the so-called prefix property; i.e. no code of anysymbol can be a prefix of the code of another symbol. The consequence of this is that the codes are instantaneously decodable; i.e. a symbol can be recognized from the code stream right after its last bit has been received. Well-known prefix codes are Shannon-Fano and Huffman codes. They can be constructed empirically on the basis of the source. Another coding scheme, known as Golomb-Rice codes, is also a prefix code, but it presumes a certain distribution of the source.