Abstract

Learning-based super-resolution can reconstruct the high resolution image results in high quality. However, building a training-based super resolution system has a huge computing complexity in times and spaces. In this paper, we propose a GPU-based super resolution system using Super Resolution Through Neighbor Embedding (SRNE) and GPU-Brute Force (BF)KNN search method to reduce the processing time.

SRNE is that the local geometries of image patched are similar in two distinct HR and LR feature spaces. We use BF KNN method to search training database the similar image patch and reconstruct the HR training patches. Experiments show that our results can improve more than 10x speed up computation times.

1.Introduction

Super resolution is the problem of generating a high resolution image from one or more low-resolution images. One of the most popular methods is “Super Resolution through Neighbor Embedding”(SRNE)[1]. This paper is on generating a high-resolution image (HR) from a single low-resolution image (LR) with a set of many training data images from scenes or different type. Chang et al. proposed method that is generated a high resolution image patch does not depend on only one of the nearest neighbors in the training set.

In SRNE, there are some computational problems. For example, it is very challenging to build result image in a few minutes time, learning-based can‘t process online. In this paper, we propose a training-based super resolution in GPU parallel computing. Because SRNE has high parallel in feature extraction, build training data, KNN search method, grain matrix and reconstruct HR image.In KNN search, we apply fast KNN search in GPU[2], this method propose a CUDA implementation of the “bruteforce” KNN search to solving this problem.

We consider SRNE in GPU computing that will have some performance improvement. And build a large training patch dataset will reconstruct high quality result images.

The rest of the paper is organized as follows. In Sect. 2 we discuss related works in super resolution and GPU computing. And Sect. 3 we describe our SRNE with GPU in details. We report the experiment results and compare performance evaluation between CPU version and GPU version SRNE in Sect 4. Finally, discuss and conclusion in Sect. 5.

2.Related work

Resolution enhancement using smoothing or interpolation method is widely used in image processing. Smoothing is to use some filters such as Gaussian Filter and Median Filter. Interpolation method is usually achieved by using bilinear interpolation or bicubic interpolation, which are easy to implement. However, both methods may cause some artifacts.

Recently, many methods have been proposed. Some methods make use of training images [5, 6] or require strong priors [7]. Each of these methods in high resolution image comes from only one nearest neighbor.

Learning-based super resolution can recover the high resolution image in high quality. [2] use random projection tree with manifold learning and use GPU-base knn search to accelerate the procedure.

In the paper we implement, it proposes a method use multiple training images to generate each image patch in high resolution image. We find k most similar patches to composite final high resolution patch given a specific weighting vector. This property makes generalization over the training samples is possible and hence fewer training samples are required. Most important of all, the method can be highly-parallelized using GPU.

3.System Architecture

3.1System overview

The system architecture overview is show in Fig. 1. There are two parts in this architecture, LLE processing in left-side and training processing in right-side.

Fig.1 system flowchart

In system flowchart detail, we show in Fig. 2, each state has different color representation. Red state denote compute in GPU, light blue state is the addition function in SRNE method, brown means training data processing.

Fig. 2 system flowchart in detail

At the pre-processing stage, we apply color transform and calculate patch complexity, a simple check mechanism – patch complexity measurement, such as patch mean and variance. Because we believe that some patches are too smooth or no evidently different than neighbor patches if those run into LLE process will be increased more computing. In GPU computation, there widely classify three categories, global memory, shared memory and register. Global memory can store large data as video card support, usually 128MB to 1GB in general market. Shared memory is opposite, it’s local memory and only at the same SM can use. So we need to reduce the memory usage and more efficiency.

3.2LLE processing

First, given some variable definition, an input of low-resolution image, we estimate the high resolution image from training data set of one or more low-resolution image and the corresponding high-resolution images .

As in LLE, it can be summarized as four parts:

  1. Compute the input low-resolution patch feature vector.
  2. Find the K nearest neighbor in KNN search method.
  3. Compute the reconstruction weights of the neighbors that minimize the reconstruction error.
  4. Compute the high-resolution embedding patch using the appropriate high-resolution featuresof theKnearest neighbors and thereconstruction weights.

3.3K-NN search using CUDA

K nearest neighbor (KNN) is a method for classifying object based on closing training examples in feature space. In our method, we use Brute Force KNN search[2] to find k nearest LR patches in training data for each patch in query image, fig. 3 is an example of KNN search problem solution.

Fig. 3 Example graph of thekNN search problem in k =3
Red cross is query point, blue points are reference points

Letbe the set of training patches feature vectors, be the set of query image patch feature vector. Our task is to search the k nearest neighbors of each query patch given a specific distance. In this project we use 2-norm distance as metrix.

In our implementation,we use brute (BF) search to find KNN. The BF algorithm is the following:

1.Compute all the distance between
2.Sort the computed distance.
3.Choose the k training patches feature vector corresponding to k smallest distance
4.Repeat step 1. to 3. For all query image patch

The two main parts BF method ofKNN are computation of distance and sorting. Each of these two parts is highly-parallelizable.This property makes the BF method perfectly suitable for CUDA implementation. The computation of distance can be fully parallelized since the distance between pairs of points are independent. The sorting part can also be implemented in CUDA. Each thread sorts all distances computed for a given query patch.

3.4Neighbor embedding method

For each patch, in the low-resolution image, first compute the reconstruction weights of its neighbors in by minimizing the local reconstruction error. The HR embedding is then estimated from the training image pairs by preserving local geometry. At last, we overlap training HR patches in the result image.

The neighbor embedding algorithm as follows:

  1. For each patch , in image :
  2. Find the set of K nearest neighbors in , using [2].
  3. Compute the reconstruction weights of the neighbors that minimize the error of reconstructing
  4. Compute the high-resolution embedding using the appropriate high-resolution features of the K nearest neighbors and the reconstruction weights, example in fig.4.
  5. Construct the target high-resolution image byenforcing local compatibility and smoothnessconstraintsbetween adjacent patches obtained instep 1C.

Fig. 4 Neighbor embedding procedureapplied to a low-resolution patch for3X magnification

4Results

4.1Setup

The computer use for show results and evaluation performance is an Intel Duo CPU3 GHz, with 4GB of RAM memory. The graphic card we used is a NVIDIA GeForce 9600 GT with 2G memory and 64 multiprocessors.

4.2Bilinear vs. SRNE with GPU

In results, we show some results between bilinear and our SRNE method in GPU. Second, we give the processing time between CPU and GPU version.

Fig. 5 shows the bilinear interpolation and our results.We apply neighbor embedding to a small 3x3 patch from a low-resolution house image in fig.5(a). Other results show in fig.6 to fig. 10.

Fig 5.(a) build / Fig. 5(b) bilinear 2X
Fig. 5(c)SRNE 2X

4.3Performance evaluation

In this section, we analyze the CPU and GPU processing time in SRNE.The scaling factors are all 2 andthe databasecontains 46370 patches. Table 1 show the performance between CPU and GPU version SRNE. There are four test image dataset in this evaluation, original size from 50x50 to 171x120. The speed-up detail which show in Table 1.

Table 1
Image / Size / CPU(s) / GPU(s) / Speed up
house / 50x50 / 35.42 / 1.93 / 18.37
testing01 / 124x106 / 162.99 / 7.55 / 21.60
paint / 125x137 / 281.95 / 9.67 / 29.15
butterfly / 156x100 / 176.11 / 8.87 / 19.85
house01 / 171x120 / 364.96 / 11.42 / 31.97
Fig. 6(a) house / Fig. 6(b) Result(2x)
Fig. 7(a) testing01 / Fig. 7(b) Result(2x)
Fig. 8(a) paint / Fig. 8(b)Result(2x)
Fig. 9(a) butterfly
Fig. 9(b) Result(2x)
Fig. 10(a) house01
Fig. 10(b) Result(2x)

5Conclusion

In this paper, we propose a GPU-based super resolution system using Super Resolution Through Neighbor Embedding (SRNE) and GPU-Brute Force (BF) KNN search. We show that using CUDA to do super resolution can accelerate up to a factor between 20 and 30 compared to CPU-base implementation. This improvement allows us to get higher quality high resolution image with more training images within short time. Besides, wespeed up whole procedure with pre-processing the image by checking patch complexity. Image patches with lower complexity represent its smoothness. We skip the CUDA process and just do bilinear interpolation for these smooth patches.

Future work is that we want to accelerate the linear system solving part in CUDA since we just implement in CPU currently.

References

[1]H. Chang, D.Y Yeung, Y. Xiong, “Super-Resolution Through Neighbor Embedding”, CVPR 2004, pp. 275 – 282, 2004

[2]J.Pu, J. Zhang, P.Guo and X. Yuan, “Interactive Super-resolution through Neighbor Embedding,”ACCV 2009, pp. 496 – 505, 2009

[3]V. Garcia and E. Debreuve and M. Barlaud. “Fast KNearest Neighbor Search using GPU”, CVPR Workshop on Computer Vision on GPU, June 2008

@INPROCEEDINGS{Garcia_2008_CVGPU,
author = {V. Garcia and E. Debreuve and M. Barlaud},
title = {Fast k nearest neighbor search using GPU},
booktitle = {CVPR Workshop on Computer Vision on GPU},
year = {2008},
address = {Anchorage, Alaska, USA},
month = {June}
}

[4]V. Garcia. Ph.D. Thesis: Suivid'objetsd'intérêtdansuneséquenced'images : des points saillants aux mesuresstatistiquesUniversité de Nice - Sophia Antipolis, Sophia Antipolis, France, December 2008

[5]W.T. Freeman, T.R. Jones, and E.C. Pasztor.“Example-based Super-Resolution”, Computer Graphics andApplications, 22(2):56–65, 2002.

[6]W.T. Freeman and E.C. Pasztor, “Learning Low-level vision”, ICCV 1999, volume 2, pages 1182–1189, September 1999.

[7]R.R. Schultz and R.L. Stevenson, ”A Bayesian Approachto Image Expansion forImproved Definition”, IEEETransactions on Image Processing, 3(3):233–242, 1994.