EE-5356 DIGITAL IMAGE PROCESSING

Metric Structural Similarity (SSIM) and

Universal Image Quality Index (UIQI)

Ajaykumar Sharma

A perceptual image quality assessment metric-structural similarity (SSIM)

This project is related to structural similarity index (SSIM) which represents perceptual image quality based on the structural information. SSIM is an objective image quality metric and is superior to traditional quantitative measures such as MSE and PSNR. This project demonstrates the SSIM based image quality assessment and illustrates its validity in terms of human visual perception. It will be very helpful to understand SSIM and its applications by reviewing the paper listed below.

(a)  A general form of SSIM is

,

[Note that >0, β>0 and γ >0 are parameters used to adjust the relative importance of the three components].

where are image patches and

, , and

l(x,y) is luminance comparison (eq. 6), c(x,y) is contrast comparison (eq. 9), and s(x,y) is structural comparison (eq. 10). are constants. mx, my, sx, sy, sxy are defined in Eqs.14, 15, 16 in the reference paper. Gaussian weighting function has the form ,

For more information, students are referred to the paper, Z. Wang et al. "Image quality assessment: From error visibility to structural similarity," IEEE Trans. Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004, which can be downloaded from http://www.cns.nyu.edu/~lcv/ssim/ (Also from IEEE Xplore).

Write a Matlab function as my_ssim whose the inputs are two images and,, the outputs are SSIM_metric and SSIM_map between the two images. The SSIM_metric is the mean value of SSIM_map which is computed by a local moving window (11x11 pixels). You can download the Matlab implementation of the SSIM at http://www.cns.nyu.edu/~lcv/ssim/ as a reference.

(b)  Set , as in the paper and apply your function to the different distorted Lena images (512x512) with the same mean square error (MSE). The test images can be downloaded at the link “Universal image quality index” at http://www.cns.nyu.edu/~zwang/. Compute the SSIM_metric and show the SSIM_map. You will find that the SSIM_metric is more correlated to human perception of quality compared with MSE.

(c)  Fix as in (b) and set as 1. Choose any 5 pairs of , for example

[] , [],[],[],[] and apply your function to the distorted Lena images (You can use all or parts of the images). Then find the to produce the SSIM_metric most correlated with your perception of the quality. There is no right answer, which is totally dependent on your own opinion of the quality.

SSIM was introduced in JVT Hannover (Germany) meeting April 2008. Document is referred to JVT-AB31 (W.S.Kim, Z. Li, P. Pahalawatta,and A.M. Tourapis from Dolby Lab, Burbank CA)

Universal Image Quality Index

1.  Z. Wang and A.C. Bovik, “ Universal Image Quality Index” IEEE SP letters, vol. 9, pp. 81-84, March 2002

2.  Y-G. Wang et al, “Robust dual watermarking algorithm for AVS video”, SP:IC( SP:IC Signal Processing: Image Communication), vol. 24, issue 4, pp. 333-344, April 2009. Can be accessed from SEL using Science Direct.

Read the first paper in detail especially equations 1-3. See figures 1 and 2. Using Lena image obtain various corrupted/contaminated images. List Q for the various distortions (see table 1)

Evaluate SSIM for all these corrupted images. Prepare a table similar to table 1 showing the different types of distortions, MSE, Q and SSIM. What are your conclusions?

Hint: Access the MATLAB software given in the link below:

“http://www.cns.nyu.edu/~zwang/files/research/quality_index/demo.html”

Data:

x = { xi, i= 1, 2, …….., N} original image

y = {yi, i= 1,2,………..,N} Reconstructed or corrupted image

The proposed quality index is defined as:

Where

Q can also be defined as the product of three components:

[I] [II] [III]

Where,

[I] defines the degree of correlation between x and y with dynamic range between [-1,1]

[II] measures how close the luminance is between x and y range is [0,1].

[III] measures how similar the contrasts of the image x and y are.

Measure the statistical features locally and then combine them to get the global Q.

Where M is the total number of steps. Use a sliding window of size BxB. Choose B=8.

512

(BxB)

(512x512) image is divided into blocks of size (BxB).

Move sliding window of size (BxB) by one period at a time (horizontally till the right border is reached and vertically till the bottom border is reached). Here M is the number of steps.

Image of size (LxL) with sliding window of size (BxB).

Number of steps sliding window takes to cover entire image M is [(L-B)+1]².

Theory for SSIM:

The structural similarity (SSIM) method is a recently proposed approach for image quality assessment. It is widely believed the statistical properties of the natural visual environment play a fundamental role in the evolution, development and adaptation of the human visual system. An important observation about natural image signal samples is that they are highly structured. By “structured signal”, mean that the signal sample exhibit strong dependencies amongst themselves, especially when they are spatially proximate. These dependencies carry important information about the structure of the objects in visual scene. The principal hypothesis of structural similarity based image quality assessment is that the HVS is highly adapted to extract structure information in the visual field, and therefore a measurement of structure similarity or distortion should provide a good approximation to perceived image quality.The SSIM index is a method for measuring the similarity between two images. The SSIM index is a full reference metric, in other words, the measure event of image quality is based on an initial uncompressed or distortion free image as reference. The SSIM is designed to improve on traditional metrics like PSNR and MSE, which have proved to be inconsistent with human eye perception.

The first instantiation of the structure similarity- based method was made in. The goal of image quality assessment research is to design method that quantify the strength of the perceptual similarity between the test and the reference images. Researchers have taken a number of approaches for the quality assessment.

The first approach is called the error sensitivity approach. In that approach the test images data is considered as the sum of the reference image and an error signal. It is assumed that the loss of perceptual quality is directly related to the visibility of the error signal. In most of HVS- based image quality assessment models attempt to weigh and combine different aspect of the error signal according to their respective visual sensitivities, which are usually determined by psychophysical measurement. And the problem with this approach is that larger visible differences may not necessarily imply lower perceptual quality.

In the second approach the observation process efficiently extract and make the use of information represented in the natural scene, whose statistical properties are believed to play a fundamental role in the development of the HVS. So, the example of the second approach is the structural similarity based image quality assessment method, which is based on observation that natural images are highly “structured,” meaning that the signal samples have strong dependencies among themselves. These dependencies carry important information about the structure of the objects in the visual scene.

Figure: 3.2 Structural SIMilarity (SSIM) Measurement System [6]

Structural Similarity (SSIM) Index

The system diagram of the SSIM quality assessment system is shown in figure 3.2. Suppose and are two nonnegative image signals, which have been aligned with each other. The purpose of the system is to provide a similarity between them. If we consider one of the signals to have perfect quality, then the similarity measure can serve as a quantitative measurement of the quality of the second signal.

The system separates the task of similarity measurement into three comparisons: luminance, contrast and structure. First, the luminance of each signal is compared. Assuming discrete signals, this is estimated as the mean intensity

And

The luminance comparison function ℓ(x,y) is then a function of µx and µy:

ℓ(x, y) = ℓ (µx, µy)

Second, remove the mean intensity from the signal. In discrete form, the resulting signal X - µx corresponds to the projection of vector X onto the hyperplane defined by

Use the standard deviation (the square root of variance) as an estimate of the signal contrast. An unbiased estimate in discrete form is given by

The contrast comparison c(x, y) is then the comparison of σx and σy:

c(x, y) = c(σx, σy)

Third, the signal is normalized (divided) by its own standard deviation; so that the two signals are being compared have unit standard deviation. The structure comparison s(x,y) is conducted on these normalized signals:

s(x, y) = (x- µx )/σx and (y- µy )/ σy.

Finally, the three components are combined to yield an overall similarity measure:

S (x, y) = f (ℓ(x, y), c(x, y), s(x, y))

An important point is that the three components are relatively independent. For example, the change of luminance and/or contrast will not affect the structures of images.

In order to complete the definition of the similarity measure in equation above, we need to define the three functions ℓ(x, y), c(x, y) and s(x, y) as well as the combination function f(.). The similarity measure needs to satisfy the following conditions.

1.  Symmetry: S(x,y) = S(y,x). Since the purpose is to quantify the similarity between two signals, exchanging the order of the input signals should not affect the resulting similarity measurement.

2.  Bounded ness: S(x,y) <= 1. Boundness is a useful property for a similarity metrics since an upper bound can serve as an indication how close the two signals are to being perfectly identical. This is in contrast with most signal-to-noise ratio type of measurements, which are typically unbounded.

3.  Unique maximum: S(x, y) = 1 if and only if x = y (in discrete representations xi = yi, for all i= 1,2…….,N ). In other words, the similarity measure should quantify any variations that may exist between the input signals. The perfect score is achieved only when the signals being compared are exactly the same.

For luminance comparison, we define

where the constant C1 is included to avoid instability when µx2 + µy 2 is very close to zero. Specifically, we choose

C1 = (K1 L) 2

where L is the dynamic range of the pixel values, e.g. 255 for 8-bit grayscale images, and K1 is a small constant. Similar considerations also apply to contrast comparisons and structure comparisons.

According to Weber’s law, the magnitude of a just-noticeable luminance change ΔI is approximately proportional to the background luminance I for a wide range of luminance values. In other words, the HVS is sensitive to the relative luminance change, and not the absolute luminance change. Letting represent the size of luminance change relative to background luminance, rewriting the luminance of the distorted signal as µy = (1 + R) µx. Substituting this into (3.9) gives

If we assume C1 is small enough (relative to µx2) to be ignored, then ℓ(x, y) is a function only of R instead of ΔI = µy - µx, qualitatively consistent with Weber’s law.

The contrast comparison function takes a similar form

Where C2 is a non-negative constant

C2 = (K2 L) 2

And K2 satisfies K2 < 1. This definition again satisfies the three properties listed above. An important feature of this function is that with the same amount of contrast change Δσ = σy – σx, this measure is less sensitive to the case of high base contrast σx than low base contrast. This is consistent with the contrast-masking feature of the HVS.

Structure comparison is conducted after luminance subtraction and variance normalization. Specifically, the direction of two unit vectors (x- µx )/σx and (y- µy )/ σy, each lying in the hyper plane defined by (3.4), with the structure of the two images. The correlation (inner product) between these is a simple and effective measure to quantify the structural similarity. The correlation between (x- µx)/σx and (y- µy)/σy is equivalent to the correlation coefficient between x and y. Thus, we define the structure comparison function as follows:

As in the luminance and contrast measures, we have introduced a small constant in both denominator and numerator. In discrete form, σxy can be estimated as

Geometrically, the correlation coefficient corresponds to the cosine of the angle between the vectors (x- µx)/σx and (y- µy)/ σy . The s(x, y) can take on negative values.

Finally, we combine the three comparisons of (6), (9) and (10) and name the resulting similarity measure the SSIM index between signals x and y.

where α, β and γ are parameters used to adjust the relative importance of the three components. It is easy to verify that this definition satisfies the three conditions given above. In particular this paper α, β and γ are set to 1 and C3 = C2/2. This results in a specific form of the SSIM index:

In practice, one usually requires a single overall quality measure of the entire image. We use a mean SSIM (MSSIM) index to evaluate the overall image quality:

where x and y are the reference and the distorted images, respectively; xj and yj are the image contents at the jth local window; and M is the number of local windows of the image. The SSIM index is better in capturing poor quality regions.