Comparative Analysis of Overlay Myanmar Text Regions Detection by Correction in Spatial



Comparative Analysis of Overlay Myanmar Text Regions Detection by Correction in Spatial and Frequency Domain

1Thu Zar Tint, 2Dr.Nyein Aye 1University of Technology (Yadanarpon Cyber City), Myanmar. 2University of Computer Studied (Mandalay), Myanmar

Abstract--Template matching is one of the important techniques in digital image processing. This paper proposes a correlation model for verification step to detect and extract Myanmar caption text from video scenes in real time. In this paper, Myanmar character templates are utilized to match with frames from videos by using two template matching methods. The main objective of the paper is to present comparative analysis of the performance of the correlation model in spatial domain as well as frequency domain for searching the position of Myanmar text region in video scenes. In spatial domain approach, normalized cross correlation is used and correlation theorem is utilized in frequency domain approach.

Index Terms--Template matching, normalized cross correlation, correlation in frequency domain vai fast Fourier Transform.

INTRODUCTION

A basic problem faced in image processing is to determine the position of a given pattern in an image or part of an image, the so-called region of interest. The template matching is considered as one of the basic approaches that can be used in determining the region of interest. Template matching techniques is widely used in many applications of digital image processing. One of the most fundamental means of object detection within an image field is by template matching, in which a replica of an object of interest is compared to all unknown objects in the image field.

Many computer vision techniques require matching parts of images. Examples include registering images to determine shift and deformation for reconstruction using stereo and motion, matching to a reference for recognition and extraction of information to index into an image database. Matching is fundamental to computer vision. Template matching is the most important and crucial procedure in object recognition.

The choice of matching depends on the nature of the image and the problem to be solved. General classifications of template or image matching approached are: Template or Area-based approaches and Feature-based approaches.

The rest of the paper is organized as follows. Matching approaches are described in Section II. And Section III shows pre-processing state before matching. The comparative analyses of these two methods are shown in Section V, followed by conclusion in Section VI.

MATCHING APPROACHES

Several different approaches have been developed to solve this correspondence search problem. Among these different approaches, the functions Simple Cross-Correlation (SCC), Normalized Cross-Correlation (NCC), Zero Mean Normalized Cross-Correlation (ZNCC) and Moravec (MOR) are pure similarity measures. The best match corresponds to the maximum value of these functions. In the contrast, the functions Normalized Zero Mean Sum of Squared Differences (NZSSD), Sum of Squared Differences (SSD), Sum of Absolute Differences (SAD), Normalized Sum of Squared Different (NSSD), and Zero Mean Sum of Absolute Different (ZSAD) represent difference or dissimilarity functions. Therefore, the best match is obtained when the returned value of these functions is the minimum [1]

A. Sum of Absolute Differences (SAD)

The Sum of Absolute Difference (SAD) is a simple algorithm for measuring the similarity between template image T and the sub-image in source image S. For each pixel location (x, y) in the image, the SAD distance is calculated according to the following equation:

In this algorithm, the smaller the distance measure obtained by SAD function between the template image T and a sub-image in the source image S, the closer match between the searched template and that corresponding sub-image is. Therefore, if the measured distance by SAD function is zero, the local sub-image is identical to the template [2].

B. Sum of Squared Difference (SSD)

The SSD (Sum of Squared Difference) over a small window is one of the simplest and most effective measures of image matching. This measure has a higher computational complexity compared to SAD algorithm as it involves numerous multiplication operations. SSD can be viewed as squared Euclidean distance. The equation of the sum of squared difference (SSD) is defined as follows:

C. Cross Correlation

There is a relationship between sum of squared difference (SSD) and cross correlation template matching approaches. By expanding this equation (2),

Note that the term is constant. Assuming that the term (i.e., the local image energy) is approximately constant, the remaining term (i.e., cross correlation term)

is a measure of the similarity between the image and the template , the larger the value of c, the more similarity the image and template are [3].

There are several disadvantages to using cross correlation for template matching:

If the image energy varies with position, matching using cross correlation can fail. For example, the correlation between the feature and an exactly matching region in the image may be less than the correlation between the feature and a bright spot.
The range of is dependent on the size of the feature.
Cross correlation is not invariant to changes in image amplitude such as those caused by changing lighting conditions across the image sequence [4].

D. Normalized Cross Correlation

The correlation coefficient overcomes the cross correlation’s difficulties by normalizing the image and feature vectors to unit length, yielding a consine-like correlation coefficient. The normalized cross correlation is defined as follows:

where denotes the mean value of within the area of the template t shifted to (u,v) which is calculated by

With similar notation is the mean value of the template t. Due to this normalization, is independent to changes in brightness or contrast of the image, which are related to the mean value and the standard deviation. It is more robust than other similarity measures, like simple covariance or the sum of the absolute differences (SAD). But, its main drawback is computationally expensive [5] .

E. Correlation in Frequency Domain

One of other matching approaches is correlation in frequency domain. In this step, the basic idea is to implement correlation in Frequency Domain using Complex Conjugate of the Fourier Transform of the image. Based on the correlation theorem, correlation in the spatial domain can be obtained by taking the inverse Fourier transform of the product where is the complex conjugate of F. This result formally states as:

where f(x, y) is the candidate region having N M sizes and h(x, y) is Myanmar characters template having K L sizes. It consists of the following steps:

1. Multiply the candidate region and the Myanmar character template image by centre transformed.

2. Compute F(u, v), the FFT of the Zero padded candidate region from (1).

3. Compute H(u, v), the FFT of the Zero padded Myanmar characters template from (1).

4. Multiply F(u, v) by (Conjugate of H(u,v)).

5. Compute the inverse FFT of the result in (4).

6. Obtain the real part of the result in (5).

7. Multiply the result in (6) by [6].

PRE-PROCESSING STATE

There are many processes in this pre-processing sate before performing the matching process.

A.Learning Myanmar Character Templates

One of these processes in this pre-processing is learning Myanmar characters templates, which is an essential process for template matching approaches. As Myanmar characters have significant features and they are similar in shape with one another, these templates are learned into six groups by their shapes and aspect ratios which are shown in Fig. 1. In the pre-processing state, each grey scale Myanmar character template is converted into binary image by using Otsu’s threshold method.

(a) (b)

(c)

(d)

(e) (f)

Figure 1.Groups of trained templates by their similar shapes and aspect ratios (a) with 0.3< aspect ratio<0.5 (b) with 0.5 < aspect ratio <0.7 (c) with 0.7 < aspect ratio <1.2 (d) with 1.5 < aspect ratio <1.95 (e) with 1.2 < aspect ratio < 1.5 (f) with 1.95 < aspect ratio < 4

B. The Transition Maps Generation

In general, the overlay text tends to be bright if the background of overlay text is dark. On the contrary, the overlay text tends to be darks if its adjacent background is bright. Therefore, there exist transient colors between overlay text and its adjacent background due to color bleeding, the intensities at the boundary of overlay text are observed to have the logarithmical change. Since the change of intensity at the boundary of overlay text may be small in low contrast image, to effectively determine whether a pixel is within a transition region, the modified saturation is computed based on the change of intensity and color saturation.Transition maps are generated by combination of modified color saturation and the change of intensity. The detail of generation of transition maps is explained in [7].

C. Candidate Regions Extraction

After getting transition maps, the next step is to find candidate regions from each transition map. In this step, filling process is firstly done within gaps of consecutive transition pixels. Since it is reasonable to assume that the overlay regions are generally in rectangular shapes, a rectangular bounding box is generated by linking four points, which corresponding to (min_x, min_y), (max_x, min_y),(min_x, max_y), (max_x, max_y) taken from the link map. To eliminate false positives, aspect ratio of every candidate region is calculated. If the aspect ratio of a component is smaller than the threshold, that candidate region is considered as a false positive and removed; if not, it is accepted as a candidate text region.

IV. OVERLAY MYANMAR TEXT REGIONS VERIFICATION

The previous state, the candidate region, can include not only real texts but also other similar non-text.This step is to determine the real overlay Myanmar text regions or not among the boundary smoothed candidate regions. In this verification step, it is necessary to find the similarity value of the smooth boundary candidate regions. To calculate the similarity values of object in each candidate region, two template matching approaches which are normalized cross correlation and correlation in frequency domain using Complex Conjugate of the Fourier Transform, are employed.

The following is the general procedure of finding similarity value between each candidate region and pre-trained templates.

Firstly search the candidate region and then calculate the aspect ratio of a character or an object in this candidate region.
Resize the pre-trained character templates according to aspect ratio of one in this candidate region.
Compute the similarity value of ones in each candidate region by using either Normalized Cross Correlation or Correlation in Frequency Domain.
If the NCC value is greater than 0.6, it is assumed as a Myanmar character and increase one in count. If not, it is assumed as non-text region and discard.

V. COMPARATIVE ANALYSIS OF EXPERIMENTAL RESULTS

In this section, performance of normalized cross correlation method compares with performance of correlation in

(a) (b)

Figure 2. Matching Process (a) Candidate Region (b) A template image (c) NCC plot (d) DFT plot

Table 1. Performance analysis for sample video 1

Normalized cross correlation / Correlation in frequency domain
Actual Text Block / 135 / 135
Truly Detected Block / 135 / 134
Falsely Detected Block / 5 / 8
Text Block with missing data / 0 / 1
Recall / 100% / 99.2%
Precision / 96.4% / 94.3%
Missdetection Rate / 0% / 0.75%

Table 2. Performance analysis for sample video 2

Normalized cross correlation / Correlation in frequency domain
Actual Text Block / 138 / 138
Truly Detected Block / 138 / 125
Falsely Detected Block / 10 / 13
Text Block with missing data / 0 / 13
Recall / 100% / 90.6%
Precision / 93.2% / 90.6%
Missdetection Rate / 0% / 10.4%

Table 3. Performance analysis for sample video 3

Normalized cross correlation / Correlation in frequency domain
Actual Text Block / 120 / 120
Truly Detected Block / 118 / 115
Falsely Detected Block / 5 / 9
Text Block with missing data / 2 / 5
Recall / 98.3% / 95.8%
Precision / 95.9% / 92.7%
Missdetection Rate / 1.7% / 4.34%

Table 4. Performance analysis for sample video 4

Normalized cross correlation / Correlation in frequency domain
Actual Text Block / 120 / 120
Truly Detected Block / 119 / 110
Falsely Detected Block / 6 / 20
Text Block with missing data / 1 / 10
Recall / 99.1% / 91.7%
Precision / 95.2% / 84.6%
Missdetection Rate / 0.84% / 9.09%

Table 5. Performance analysis for sample video 5

Normalized cross correlation / Correlation in frequency domain
Actual Text Block / 111 / 111
Truly Detected Block / 110 / 103
Falsely Detected Block / 1 / 1
Text Block with missing data / 1 / 8
Recall / 99.1% / 92.8%
Precision / 99% / 99.1%
Missdetection Rate / 0.9% / 7.7%

Figure 3. Average Performance Measure for five sample videos

Figure 4. Total Processing Time for five sample videos

frequency domain. Since, there is no standard benchmarking dataset available, a various kind of videos such as news programming videos, movie clips, sport videos which involve overlay Myanmar text, have been tested. In this paper, we have tested five sample videos to compare the experimental results of Normalized Cross Correlation with ones of Correlation in frequency domain. Firstly, we present a process of finding similarity value shown in Fig. 2. Then, Table.1,2,3,4,5 show the performance of the experimental results for five sample videos. Fig.3 displays the average performance measure of five sample videos. Total processing times of each sample video by the corresponding method are depicted in Fig.4.

VI. CONCLUSION

This paper presented the comparative analysis of two template matching approaches which are normalized cross correlation and correlation in frequency domain. In the verification step of text detection system, the similarity value of object in each candidate region is calculated by using either normalized cross correlation or correlation in frequency domain. Experiments are conducted on various kinds of video which contain overlay Myanmar texts. Experimental results show that the experimental results show that normalized cross correlation has higher accuracy than correlation in frequency domain. Its main drawback is computationally expensive. Conversely, correlation in frequency domain has faster processing time than normalized cross correlation.

IX.REFERENCES

[1]Nuno Roma, Jose Santas-Victor, “A Comparative Analysis of Cross-Correlation Matching Algorithm Using a Pyramidal Resolution Approach”, Citeseer, 2002.

[2]Fawaz Alsaade, “Fast and Accurate Template Matching Algorithm Based on Image Pyramid and Sum of Absolute Difference Similarity Measure”, Research Journal of Information Technology, vol(4) :2012, pp. 204-2011.

[3]Konstantinos G. Derpanis, “Relationship Between the Sum of Squared Diﬀerence (SSD) and Cross Correlation for Template Matching”, (2005).

[4]Lewis, J. P., 1995, “Fast normalized cross-correlation”.Online,Internet,Available:

[5] Briechle, K., Hanebeck, U. D., “Template matching using fast normalized cross correlation”, In: Proceedings of SPIE, V. 4387, Optical Pattern Recognition XII,Orlando, FL, 2001,pp. 95-102

[6] Rafael C.Gonzalez, Richard E.woods, Digital Image Processing, 2nd edition, 2002.

[7]Thu Zar Tint, Dr Nyein Aye, “Overlay Myanmar Text Region Detection and Extraction from Video Scenes”, International Conference on Computer Science and Information Technology,pg:7-13.

