Contributions to a Study of Sub-Band Image

Coding UsingAdaptive Multi-Level

Block Truncation Coding

By

Jason Kidd

Supervisor: Professor David Coll

A report submitted in partial fulfillment of the requirements

of 94.497 Engineering Project

Department of Systems and Computer Engineering

Faculty of Engineering

CarletonUniversity

April 4, 2003

Abstract

There are certain key variables within Professor Coll’s Adaptive Multi-Level Block Truncation Code algorithm which need to be optimized so that the software can compress and then reconstruct an image with maximum image quality and a minimum filesize. An iterative approach was used to experimentally determine optimal values for thresfactor and quantfactor, whose final values were found to be 0.85 and 1.55, respectively.

The Zone Definition Algorithm defines which subbands contribute the greatestamount of energy to the image, and how we weight these subbands to simultaneously maintain the image quality and reduce the filesize. Adaptive Quantization helps maintain image quality by narrowing in on areas of high image activity to represent them more accurately. It is used in conjunction with the Zone Definition Algorithm to effectively weight the importance of each subband. Adaptive Quantization involves computing the standard deviation of the values of the pixels belonging to a block, and comparing it to a pre-defined threshold. It is therefore very important to define an optimal threshold value which can then be applied to a ‘zone’ of subbands with similar energies to determine the level of Adaptive Quantization. The result was a total of 4 ‘zones’ of AC subbands, each having an independent threshold. Starting with the zone of AC subbands containing most energy, with each successive zone of subbands decreasing in order of energy, optimal threshold values were found to be 3.0, 1.0, 1.0, and 1.0, respectively.

Documentation was also added to the existing code which should increase legibility and expedite the learning process for future generations of developers.

Table of Contents

  1. The Need for Image Compression ...... 1
  1. Introduction ...... 1
  2. Problem Motivation ...... 3
  3. Problem Statement ...... 7
  4. Proposed Solution ...... 13
  5. Accomplishments ...... 15
  1. My Contributions ...... 16
  1. Terminology ...... 16
  2. Project Details ...... 18
  1. Conclusions and Recommendations ...... 26
  1. References ...... 29
  1. Appendices ...... 30
  1. Appendix A ...... 30
  2. Appendix B ...... 41
  3. Appendix C ...... 43

List of Figures

Figure / Title / Page
1.1 / Energy distribution of a spectrum / 6
1.2 / A visual interpretation of the threshold boundaries / 8
1.3 / A visual interpretation of the quantization values / 9
1.4 / A visual interpretation of Adaptive Quantization / 11
1.5 / A visual interpretation of the Zone Definition Algorithm (ZDA) / 14
2.1 / A visual interpretation of a standard deviation / 19

Introduction

As we move into the 21st century, communications technology is converging towards one common theme: multimedia. Surfing the web is no longer limited to text. In fact, by today’s standards, a website with no images or sounds is quite often looked down upon and quickly tossed aside for a more visually pleasing website. Even portable products with limited memories such as laptops, PDA’s and cell phones now have the ability to store and display digital images.

The demand for digital images has risen dramatically in recent years, and the end is not in sight. There are many uses for digital images, some of which include satellites (LANDSAT), MRI’s (magnetic response imaging), CT (computed tomography), law enforcement, digital cameras, and of course, the internet. The reason for such a demand of digital images is simple: they are easy to manipulate and transfer via any digital signal.

However, there are many significant drawbacks with a digital image in raw form. A raw digital image requires an exceptional number of bits to store all of the information it possesses. This not only leads to increased storage costs, but also contributes significantly to the overall transmission time and bandwidth cost of an image traveling over a network.

The problem of image compression becomes more apparent when we consider a digital video stream. In layman’s terms, a digital video is simply a series of still digital images that are shown in rapid succession, often falling in the range of 15 to 30 frames per second. If proper compression techniques were used to compress each individual frame of a 30 frame-per-second video to a very reasonable ratio of 1/3rd of its original size, then the overall video would be compressed by 66%. This can translate into tens or even hundreds of megabytes less when compared to the original filesize. From a network point of view, a compressed digital image or video can reduce transfer time significantly, and save on bandwidth and storage costs. As multimedia continues to become an increasingly larger part of our everyday lives, image compression technology is becoming more and more important so that we can store and transfer these applications and images efficiently.

Upon closer inspection of the characteristics of a digital image, one can see that a raw digital image contains a great deal of redundant information. The underlying reasonbehind image compression is that if we were to remove this redundant information we could compress the byte-size of an image by a significant amount, and thus considerably decrease the costs required for transmission and storage.

Since the inception of digital images, computer scientists and engineers have been developing techniques to reduce these redundancies and maximize compression, without dramatically reducing the image quality. This is where the art and science of image compression come into play. Image compression is the art of efficiently coding the spectrum of a digital image through discarding redundant image information. It is important to note that we are not actually compressing the image per se, but rather the spectrum of the image, which is a collection of spatially-oriented like-ordered coefficients called sub-bands. Through various methods, we can significantly reduce the required number of bytes to represent an image without severely degrading the overall image quality.

Problem Motivation

The need for image compression becomes apparent when we consider the number of bytes required to store an image. Consider a typical image that can be commonly found on the internet with this format: 512 x 512 pixels, 8 bits per pixel, and 3 colours. Storing all this information in binary format would require 6 megabits. Now consider a more professional use of a digital image, for example a MRI. High resolution scans of this type can easily range as high as 5000 x 6000 pixels, 12 bits per pixel, represented in a monochrome scheme. Such a precise image can require up to 360 megabits in raw form. To put this into perspective, if we were to transfer an MRI image over a standard 56kbps modem running at optimal speed, this would take 1 hour and 45 minutes to transmit! This is simply not acceptable, and it is evident that compression is needed.

In today’s constantly evolving technological society, the bar for achieving the greatest compression ratio is always being raised. Compressed digital images that were considered “small enough” yesterday are not “small enough” by today’s standards. For example, a multimedia application containing 1000 raw images obviously requires some sort of image compression to reduce the overall filesize of the application. Using a standard image compression algorithm which would reduce each raw image by an average of 1000 bits would yield a savings of (1000 images * 1000 bits per image), or 1 megabit. However, if we were to extend on this compression algorithm to achieve a higher compression ratio which could effectively reduce the filesize of each raw image by an average of 1200 bits, that would lead to an overall savings of (1000 images * 1200 bits per image), or 1.2 megabits. At first glance, reducing the application’s filesize by 0.2 megabits may not be so astounding. But if the application’s developer expected to electronically transmit 500,000 copies of this program to customers, this would yield a bandwidth savings of 100,000 megabits! Granted the quality of the image remained the same in both compression algorithms, the improved technology would quickly become the new standard in image compression.

There are various techniques we can employ to compress an image. A common approach is to represent each pixel with fewer bits compared to the bits per pixel of the original image. This method is known as quantization. For example, representing a pixel with an 8 bit resolution down to a 2 bit resolution would reduce the overall image filesize by a factor of four. Such is the case with a 3-level quantizer. At first glance, this may seem rather trivial, but the precise workings of this method are in fact quite complex. Using two bits, there are only four combinations available, so if we were to simply represent each pixel in the entire image by two bits, we would have an image composed of only four different tones, which would yield a grossly inaccurate image. The solution lies in Block Truncation Coding, or BTC. I will be elaborating on, and providing a detailed explanation regarding this technique.

Static quantization divides an image into n x n blocks of pixels and performs BTC based on the statistics collected within that block. This method is appropriate for regions of low activity (or low frequencies) because all pixels within that block do not vary greatly and would fit comfortably into one of the three zones defined by the 3-level quantizer. However, for areas of high activity (or high frequencies), pixel values can be distributed through a large range of values. A solution to this problem would be to use a smaller block size, which yields a smaller sample size, which in turn yields a greater possibility of fitting all the pixel values comfortably into one of the three quantizer zones. In these situations, adaptive quantization is used to represent an image block more accurately by dividing the n x n block into four smaller blocks of m x m, where m = n/2. Adaptive quantization is an iterative process where the need to divide a block is determined by computing the standard deviation of the pixel values within that block, and comparing it to a predefined threshold. If the standard deviation is larger than the threshold, then we divide the block into four smaller, equal-size blocks. The goal is to narrow in on regions of high activity and represent those zones by smaller sample sizes to yield a more accurate image. I will be elaborating on the technique of Adaptive Quantization further in this report.

The zone definition algorithm is important for assigning an appropriate weight to each ‘zone’ of a given spectrum. Looking at a typical spectrum (Figure 2 of Appendix A), we noticed that lowest frequencies (low degree of activity) were located in the upper-left corner, while the highest frequencies (high degree of activity) could be found in the lower-right corner. Any given image is mostly composed of low frequencies, which correspond to slow varying colours. High frequencies show very abrupt changes in an image, such as the ‘edge’ of a foreground object against the background, and therefore do not contribute any more than fine detail to the overall image. If we weight certain zones of the spectrum according to their importance in the overall image, we can considerably reduce the filesize of the image.

In general, low frequencies provide roughly 98% of the information in the picture. Therefore, if we were to cut a diagonal line reaching from the lower-left corner to the upper-right hand corner, we could effectively reduce the filesize by a factor of 2 while sacrificing a minimal 2% of information (see Figure 1.1). However, we don’t literally cut off the lower-right half of the spectrum. Instead, we create ‘zones’ for each diagonal line of sub-bands starting from the upper-left hand corner, and weight them according to how much energy they contribute to the overall image.

Figure 1.1: Energy distribution of a spectrum

Documenting code with proper comments is an invaluable process which helps ease the learning process of future generations and simplify future upgrades. I was surprised to find a lack of proper documentation among the existing code, and experienced difficulties in deciphering and understanding the function of certain variables and operations. I felt that it was in my best interest, as well as the interests of future generations, to properly document and style the code as per standard documentation procedures.

Problem Statement

There are various areas that can be improved upon to increase the current image compression ratio while maintaining the image quality. These techniques are listed below.

Block Truncation Coding

With the Block Truncation Coding (BTC) algorithm we are using here, an image is segmented into 64 x 64 blocks of non-overlapping pixels. We then collect the statistics of each block (maximum value, minimum value, standard deviation, and mean) for which we use to determine approximations of that block. The next step is to quantize each individual pixel value using a 3-level quantizer, to which it assigns a value of low, average, or high. This is achieved through the use of the threshold variable. Defining the three zones (low, average, high) require two boundary markers. Each boundary marker is independent for each 64 x 64 block, and this information must be stored in the compressed image at a rate of one byte per block. For this reason, we define only one threshold value and place it at the appropriate position in the positive x-axis, then negate this value and place it on the negative x-axis (see Figure 1.2). This yields two threshold boundaries, each equidistant from the y-axis, which is ideally zero. The advantages of this method is that we only have to store one threshold value, and make +threshold the upper boundary and –threshold the lower boundary. The equations by which the value of threshold is derived are as follows:

+threshold = mean + (thresfactor * std)

-threshold = mean - (thresfactor * std)

Figure 1.2: A visual interpretation of the threshold boundaries

Since we are only dealing with 3 different levels (low, average, high), each pixel value can be represented by 2 bits, which is exactly a quarter of the number of bits needed to represent the original 8-bit pixel from the original image.

From the local block statistics we collected earlier, we can calculate and assign an appropriate value for those pixels that fall into the ‘low’ and ‘high’ zones. We call this variable quantization, and it is independent in each block. Again, this variable will be stored in the compressed image, and therefore from a filesize perspective it is in our best interest to have only one quantization variable and negate it to obtain two values that are equidistant from the x-axis (see Figure 1.3). This means that a value of +quantization will be given to pixels that fall in the ‘high’ zone, and a value of –quantization will be given to pixels that fall in the ‘low’ zone. We consider the ‘average’ value always to be zero. The value of quantization is determined by the following equations:

+quantization = mean + (quantfactor * std)

-quantization = mean - (quantfactor * std)

Figure 1.3: A visual interpretation of the quantization values

For example, a typical 8 x 8 block contains 64 pixels. At a colour depth of 8 bits per pixel, this block requires 64 x 8 = 512 bits to represent it. However, using BTC to compress the image into an encoded format, the same block would require only 2 bits per pixel and two bytes, one for quantization and one for the threshold. This means that we could store the same block in (64 x 2) + 8 + 8 = 144 bits. This yields a compression ratio of 512/144 = 3.56.

This algorithm seems rather trivial at first glance, but as we delve deeper into the intricacies of this procedure we can see that it could get very complex. An obvious problem lies in the fact that the quality of the image relies heavily on assigning appropriate values to quantfactor and thresfactor. If the threshold values were set at large distances from the y-axis, then virtually all pixels will fall into the average zone during quantization. On the contrary, if they were to be set too close to the y-axis then virtually all pixels would fall in the high and low zones. Assuming we choose an appropriate value for threshold and all pixels are distributed appropriately, then the next issue is precisely what value do we assign to those pixels that fall in the low and high zones. The reconstructed image quality relies heavily on these two factors, and it is important to obtain an optimal value for each. One task is to determine what values of thresfactor and quantfactor would maximize the image quality and minimize the filesize.

Adaptive Quantization

An inherent problem with standard BTC arises with the topic of exceptionalities.

Imagine if there was an extreme value present in a certain 8 x 8 block. In other words, if 63 of the pixel values fell comfortably within one of the three threshold zones, and there was one pixel whose true value was found to be much higher than high zone of the quantizer. Using standard BTC, this deviant pixel would forced into the high zone and therefore would be represented by the average high value (the quantization value for that block), which is much lower than that pixel’s true value. Therefore, upon compression and reconstruction of an image, the reconstructed image would not accurately represent the original. Depending on the circumstances and use of the image, sometimes such errors can be tolerated. An image that is being prepared for general use on the internet usually does not require great detail, but for a more specific use like a CT scan we would want to preserve as much detail as possible. An appropriate solution would therefore be to devise a more dynamic system which can accurately represent these deviant pixels without having a big impact on the compression ratio. The answer lies in adaptive quantization.

Adaptive quantization is based on one simple fact: a smaller sample size statistically represents the energies found in that areamore accurately. For instance, a region of high activity in an image should be treated with a smaller sample size for higher accuracy, while regions of low activity can remain with larger block sizes and thus a larger sample size.

In Prof. Coll’s Adaptive Quantization algorithm, a block always starts off with a size of 64 x 64 pixels. After collecting and analyzing the statistics of the block, we can determine key characteristics such as the mean value, and the standard deviation. Given a predefined threshold value, we compare the standard deviation value to this threshold. If the standard deviation is below the threshold, then there are not enough deviant pixels to warrant a smaller block size. On the other hand, if the standard deviation is above the threshold, then some energies present in this block are too high to be tolerated, and the block must be split into a smaller size so that the sample size is smaller and we can narrow down the regions of high activity. For example, if the standard deviation of a 64 x 64 block was found to be above the threshold, then the block would then be divided into 4 smaller blocks, each containing 32 x 32 pixels (see Figure 1.4).