Image Compression
with 2-D Discrete Cosine Transforms

Initial Project Proposal – (9/21/99)
John Hill

David Oltmanns
Delayne Vaughn
Proposal Index

Project Objectives

Deliverable

Project Background in Brief

Hardware/Software Component Description

Compression/Decompression Module(s)

Digital Camera/Camera Interface - (Connectix QuickCam)

SRAM and Memory Control Module

Serial Interface Module for FPGA

Subsystems Development Decomposition

Timeline

Team Member Responsibilities

Components

Pricing

Special Testing Environments

Hardware, Software Collaborations

References

Project Objectives

Our design will use a Connectix QuickCam and PC interfaced to a Xilinx FPGA. We hope to achieve the following goals:

  • Capture the image using QuickCam
  • Compress the image using FPGA-based 2-D Discrete Cosine

Transform (DCT) on a Xilinx FPGA

  • Decompress the image using Inverse DCT (IDCT)
  • Display decompressed image on a PC using a serial port as the interface.

Deliverable

Our final deliverable will be a compression/decompression system utilizing a FPGA interfaced with a Connectix Quickcam and a PC. The system will allow an image to be captured by the camera, compressed and decompressed by the modules built on the FPGA, and finally returned to the PC. The FPGA will be outfitted with 2-D DCT logic and also incorporate a serial interface, camera interface, and memory control module. Reference figure 1 below.


figure 1

Project Background in Brief

Image compression has been of interest to the computing community for quite some time. With the limited bandwidth of today's marketplace, the need to compress data, especially images, is in demand. The two factors that find themselves highest in priority are speed and compression ratio.

It has long been acknowledged that the Karhunen-Loev Transform (KLT) is optimal for signal compression, but it is not feasible to implement. Introduced in 1974 by Ahmed and Rao, the Discrete Cosine Transform (DCT) presents an alternative to KLT. Ahmed and Rao demonstrated the close approximation that the DCT provides to the KLT. The two-dimensional DCT for a square N x N matrix is defined as follows:


where c(i,j) is given by c (0,j) = 1/N, c (i,0) = 1/N, and c (i,j) = 2/N for both i and j  0. The input matrix is s, and t is the output matrix. Like the Fourier transform, the DCT maps the signal to the frequency domain. In fact, the DCT is often computed indirectly by first computing an FFT (Fast Fourier Transform). The output matrix, t, represents the different frequency components of the image. The upper left-hand corner of the DCT matrix provides the coefficients for the low frequency components while the lower right-hand corresponds to high frequencies.

Once the DCT has been computed, the next step in compression is quantization. This is basically discarding less-important information. The human eye is much less responsive to very high-frequency components, so a quantization matrix is derived which rounds the entries in the DCT matrix giving more attention to the lower frequencies while often virtually disregarding the higher frequencies. Quantization is the part of the process that actually allows for compression. The quantization matrix can be altered to create an acceptable balance between image quality and compression ratio. Once quantization has occurred, the data are encoded to a bit-stream in which form they are stored or transported.

Image decompression occurs in basically an opposite flow. First it is decoded, then it is dequantized. Finally, the inverse DCT (IDCT) is used to reconstruct the original signal. The IDCT is defined as:

where all variables are defined as above.

Being very computationally intensive and thus somewhat cumbersome, the DCT does not lend itself to real-time applications when implemented as software. To combat this speed lag, we are proposing to implement the DCT in hardware. Through the years, a variety of algorithms have been presented to quickly compute the DCT. We have chosen to implement a recursive algorithm based on that presented by Hou in 1987 and updated in 1991 by Cho and Lee. The benefits of this class of algorithm are myriad. First, they are spatially efficient, requiring less adders and multipliers than direct row-column approaches. Second, they are computationally stable, as opposed to alternative algorithms that involve the inversion of cosine coefficients. Third, they are direct computations of the DCT, i.e. they do not require the use of the FFT or some other transform. Finally, they provide excellent speed gains, which, after all, is the reason for porting the transform to hardware in the first place. A signal flow graph for Cho’s and Lee’s algorithm follows in figure 2.

figure 2

Hardware/Software Component Description

The final system will involve the collaboration of several modules and devices, as seen below in figure 3. This diagram shows data flow between the individual components, built using the FPGA, and the various external devices

figure 3

Compression/Decompression Module(s)

This aspect of the project is obviously the most challenging. Our initial goal is to get the algorithms working on the PC before moving them to the FPGA board(s). We will migrate the compression, decompression and finally the other modules one at a time so as to avoid mass confusion. See Subsystems Development Decompositionbelow.

This segment of the system will process 8x8 blocks, fed to it in succession via the Memory Control Module. The compressed matrices will be passed to the decompression handler, and eventually back to the PC by way of the SRAM and serial interfaces.

Fast Discrete Cosine Transforms, as discussed in the description section, will be utilized, along with Quantization and Encoding to compress the image. Likewise IDCT will accomplish the decompression. Figure 4 shows this process.

figure 4

Digital Camera/Camera Interface - (Connectix QuickCam)

The Connectix QuickCam has been used before in other projects. We will be looking to the Autonomous Tracking Unit Project (Spring 1999) and the Sign Language Acquisition and Recognition System (Summer 1999) to reestablish the connection between the camera and the FPGA.

We believe this approach will suit us best for the final system. However, we will be looking to develop a serial connection between the PC and FPGA early on, so we can emulate the camera and even part of the compression process before migrating the entire system to the FPGA. The camera and interface will later be added after the compression modules are completed.

figure 5

Figure 5 shows the subset of a previous groups design that interfaces with the camera. We plan on deploying a similar configuration.

SRAM and Memory Control Module

For storing the 8x8 bitmap chunks and computing the transforms, we will need a source of temporary storage. For this we will use a SRAM component. We will interface with the memory via a Memory Control Module and again we hope to borrow from the experience of previous semesters (Sign Language Acquisition and Recognition System).

The Memory Control Module (Verilog) will handle all interactions with the memory. Standard read and write instructions will be needed. Additionally such functions as ‘getRow’ and ‘getBlock’ may be included as convenient functions for dealing with 8x8 blocks of memory (1x8 rows at a time.)

Serial Interface Module for FPGA

The Serial Interface Module will allow us to connect a PC to our system. While we only plan on using the PC to display our final uncompressed image, it will be immensely valuable for the development of our subsystems. By using a PC to emulate our final design, we can develop each component individually for the FPGA. The PC will, in effect, stand in for the unfinished modules.


figure 6

We plan to take existing serial knowledge that has been developed by previous students and integrate it with the rest of our design. One such source is the PDACS project (Spring 1999). Figure 6 shows one scenario that this group used in their project.

Subsystems Development Decomposition

1)First we will simulate our entire design on a PC This will allow us to test our algorithm easily. We can then proceed to migrate the system to the FPGA piece by piece using the PC as a stand in for the unfinished components. Figure 7 shows this layout.

figure 7

2)When the algorithm is working, we will move the task of compression to the FPGA. The serial interface will allow us to continue to use the PC for decompression. Figure 8 shows this layout.

figure 8

3)Similarly, we can test the decompression sub-system as a stand-alone module. This will hopefully make trouble shooting less painful. Figure 9 shows this layout.

figure 10

4)When our compression and decompression routines are both in place, we can finish the system by connecting the camera directly to the FPGA. This final step will phase out the PC as part of the compression/decompression sub-system. Figure 11 shows this layout.

figure 11

Timeline

Week Three / Sept. 12 – 18
Research DCT and sub-systems
Begin Proposal
Week Four / Sept. 19 – 25
Finalize Proposal
Present Proposal (Sept. 21)
Begin Software Implementation of Compression Algorithm
Week Five / Sept. 26 – Oct. 2
Finish Software Compression Algorithm
Begin Software Decompression Algorithm
Integrate Compression/Decompression Software Components
Begin Serial Interface Module
Week Six / Oct. 3 – 9
Continue Serial Interface Module
Begin SRAM Control Module
Biweekly Report 1 (Oct. 7)
Week Seven / Oct. 10 – 16
Finish Serial Interface Module
Finish SRAM Control Module
Begin FPGA Compression Module
Week Eight / Oct. 17 – 23
Finish FPGA Compression Module
Integrate SRAM, Serial Interface, and Compression Modules
Biweekly Report 2 (Oct. 21)
Week Nine / Oct. 24 – 30
Benchmark: Test Integrated FPGA Compression
Begin FPGA Decompression Module
Begin Camera Interface
Begin Mid-term Presentation
Week Ten / Oct. 31 – Nov. 6
Finish FPGA Decompression Module
Mid-term Presentation (Nov. 4)
Integrate Decompression and Compression Modules
Week Eleven / Nov. 7 – 13
Finish Camera Interface
Integrate Camera Interface with Existing Modules
Week Twelve / Nov. 14 – 20
Benchmark: Camera Interface with Existing Modules
Biweekly Report 3 (Nov. 18)
Week Thirteen / Nov. 21 – 27
Begin Final Presentation
Thanksgiving
Week Fourteen / Nov. 28 – Dec. 3
Finalize Presentation
Overflow (Unforeseen Tasks)
Week Fifteen / Dec. 4 – 10
Final Presentation (Dec. 7)

Team Member Responsibilities

Task / Team Member(s)
Research DCT and sub-systems / John, David, Delayne
Prepare Proposal / John, David, Delayne
Present Proposal (Sept. 21) / John, David, Delayne
Software Implementation of Compression Algorithm / David, Delayne
Software Implementation of Decompression Algorithm / David, Delayne
Integrate Compression/Decompression Software Components / David, Delayne
Serial Interface Module / John
SRAM Control Module / John
Biweekly Report 1 (Oct. 7) / John, David, Delayne
FPGA Compression Module / John, David, Delayne
Integrate SRAM, Serial Interface, and Compression Modules / John, David
Biweekly Report 2 (Oct. 21) / John, David, Delayne
Benchmark: Test Integrated FPGA Compression / John, David
FPGA Decompression Module / Delayne
Camera Interface / David
Prepare Mid-term Presentation / John, David, Delayne
Mid-term Presentation (Nov. 4) / John, David, Delayne
Integrate FPGA Decompression and Compression Modules / Delayne
Integrate Camera Interface with Existing Modules / John, David
Benchmark: Camera Interface with Existing Modules / John, David, Delayne
Biweekly Report 3 (Nov. 18) / John, David, Delayne
Prepare Final Presentation / John, David, Delayne
Final Presentation (Dec. 7) / John, David, Delayne

Components

  • Computer
  • Software including: Xilinx Foundation Series, LabVIEW
  • FPGA(s)
  • SRAM
  • Project Board/Power Supply
  • Xilinx Interface Cable

Pricing

Part Name / Quantity /
Price
Xilinx XC4010E PC84 FPGA / 2 / $62.35 (each)
Connectix QuickCam / 1 / $49.95
WINBOND W24257AX-15 SRAM / 1

Special Testing Environments

Our benchmark test, as well as our final system will not require any special environments other than that which has been provided in the CPSC 483 lab.

Hardware, Software Collaborations

Our project incorporates software and hardware components into the design and development of the final system. During development, the system will move from a primarily software-based implementation to one, which takes advantage of the speed of hardware. The below figure represents the full scope of our project.

References

N. Ahmed, T. Natarajan, and K.R. Rao, "Discrete cosine transform," IEEE

Trans. Comput., vol. C-23, pp. 90-93. Jan. 1974.

K. Aldrich, D. Brandenberger, C. Chilek, and B. Raymond, "Sign Language

Aquisition and Recognition System,"

(Sept. 20, 1999)

M. Berger, J. Curtin, T. Griffin, A. King, M. Nordfelt, and J. Whitted,

"Portable Digital Compression/Decompression System,"

(Sept. 20, 1999)

J. Berglund, R. Cuaycong, W. Day, A. Fikes, and K. Shah, "Autonomous

Tracking Unit,"

(Sept. 20, 1999)

N.I. Cho and S.U. Lee, "Fast algorithm and implementation of 2-D discrete

cosine transform," IEEE Trans. CAS, Mar. 1991, pp. 297-305

N.I. Cho and I.D. Yun, and S.U. Lee, "On the regular structure for the fast

2D DCT algorithm," IEEE Trans. CAS, Apr. 1993, pp.259-266

S.C. Chan and K.L. Ho, "A new 2D fast cosine transform algorithm," IEEE

Trans. SP, Feb. 1991, pp.481-485

H.S. Hou, "A fast recursive algorithm for computing the discrete cosine

transform," IEEE Trans. ASSP, Oct. 1987, pp. 1455-1461.

C.W. Kok, "Fast Algorithm for Computing 2D Discrete Cosine Transform,"

Unpublished article, pp. 1-4

R. Mahapatra, A. Kumar, and B. Chatterji, "Performance Analysis of 2-D

Inverse Fast Cosine Transform Employing Multiprocessors," Article, pp. 1-31

1