1
0-7803-77241-9/03/$17.00 © 2003 IEEE
Automated System for Analysis and Selection of Motion Estimation Algorithm for Video Compression
1
Evgeny Kaminsky and Ofer Hadar
1
Abstract
'This paper addresses a new automated system for analysis and selection of motion estimation (ME) algorithms. We present the architecture of algorithm evaluation, ME algorithms, results of computer simulations and illustrate the analysis with tables, PSNR plots, and performance plots.
Lately, numerous algorithms and high amount of tests for evaluation of algorithms performance for video compression have been suggested, which has resulted in a need for effective evaluation methods. The highly qualified expert is also needed for conventional comparison of test results; nevertheless it doesn't eliminate the subjectivity and mistakes. The more input parameters are used, the more complex and subjective the evaluation will be.
Our automated system for algorithms evaluation and selection eliminates subjectivity and gives the qualified and quantified evaluation of the tested algorithms when using any number of the evaluated algorithms and parameters.
In the following paper we propose two new methods for evaluation of the motion estimation algorithm: (1) quality evaluation method - a graphic method using the Pareto approach and (2) quantity method -allows getting the integrated evaluation parameter composed of numerous evaluation criterions. Four evaluation strategies are suggested: (a) a strategy with constraints, (b) a strategy with "near" Pareto-efficient points, (c) a strategy with integrated evaluation parameter and (d) a strategy with importance definition. Four basic test video sequences were used.
The proposed method is universal and therefore can be used for evaluation of any other algorithms as well.
Index Terms — Fast Motion Estimation (ME), Block Matching Algorithms (BMA), Video Analysis, Algorithm Evaluation, Algorithm Selection, Multi-Criteria Analysis, Computer Simulation, Algorithm Evaluation Environment.
I. Introduction
As the number of the ME algorithms [1] and tests video sequences [2] for algorithms performance evaluation grow, there is a need for effective evaluation methods. Numerous evaluation criterions (process time, error rate, bit-rate, delay, implementation complexity, and others) are used for the algorithms performance evaluation. The highly qualified expert is needed to select the best algorithms from the total set of algorithms; however it doesn't eliminate the subjectivity and mistakes. The more input values (algorithms, test video sequences, evaluation criterions and etc.) are used, the more complex and subjective the evaluation will be.
The goal of this work was to design a new automated system for algorithm selection, simulation and analysis.
We pay special attention to the following: (a) methods for multi-parameter analysis of algori1ihms using Pareto approach (e.g., quality comparison and selection of the best algorithms) and (b) performance analysis of algorithms (e.g., quantity comparison based on integrated evaluation parameter). Additional goal was to organize these evaluation methods as a basis for student projects involving design of motion estimation (ME) algorithms and analysis of project results.
The paper describes our MA TLAB [3] software-based system (see Fig. I) for evaluation and selection of algorithms for video compression. Computer experiments were based on: (i) four traditional test video sequences (Hall-monitor, Coastguard, Foreman, and Table- Tennis) and (ii) nine fast ME algorithms (fixed search area and adaptive search area).
Algorithm analysis involved:
- Traditional comparison of test results on the basis of tables and graphics, mainly for PSNR, speed up gain (average number of operations) versus full search (FS), and rate-distortion (R-D) performance.
- Multi-criteria analysis of algorithms based on an integrated R-D parameter and a Pareto approach.
- Performance analysis of algorithms using integrated evaluation parameter.
The evaluation process involved the following: (a) simulations (i.e., execution of algorithms) on the basis of MATLAB software with compilation of programs into the C codes (Dynamic Link Library) and (b) a subsystem for the accumulation, analysis and selection of results (MA TLAB programs, comparison tables and graphics).
II. ARCHITECTURE OF Algorithm Evaluation
Figure 1 depicts a generalized scheme of our MATLAB-based automated system for ME algorithm analysis. The evaluation environment includes: (a) a library of algorithms; (b) selection and description of video sequences; (c) accumulation of test results; and (d) analysis of results and their visualization.
Fig. 1. General structure of algorithm analysisFour test video test sequences were used (30 frames in each sequence, resolution 352x288 on format CIF): Hall-monitor, Coastguard, Foreman, and Table and Tennis.
We evaluated nine fast ME algorithms [1], [4], [5]:
Group 1: Algorithms with fixed search area:
A1.Full-search block-matching algorithm (FSBMA) or exhaustive block-matching algorithm (EBMA)
A2.Three-step search (TSS or 3SS)
A3.Two-dimensional logarithmic search (2DLOG or TDL)
A4New three-step search (NTSS or N3SS)
A5Four-step search (4SS)
Group 2: Algorithms with adaptive search area:
A6Diamond search (DS)
A7Three-step gradient-based diamond search (TSGBDS)
A8Center-biased three-step gradient-based diamond search (CBTSGBDS)
A9Two-step gradient-based diamond search (2SGBDS)
The database for the algorithms involves algorithms with two distance criterions (MSE, SAD). Table I presents the simulation results for algorithms which used SAD as the distance criterion.
III. COMPARISON OF ALGORITHMS
The following computer platform was used for the computer experiments:
- AMD Athlon XP 1700+
- 512 MB DDR
- Windows 2000
- MATLAB 6.5
MATLAB programs are preliminarily transformed (by MATLAB Compiler, Lcc C version 2.4) into C codes.
A. Criteria for algorithm analysis
The following 10 traditional criteria are used:
Group 1: Criteria for the ME operation only
C1 - total time of algorithm processing between original and predicted frames
C2 - average number of macro-block matches between original and predicted frames
C3 - average MAD (mean absolute difference) or MAE (mean absolute error) between original and predicted frames [1], [6]
C4 - average MSE (mean square error) between original and predicted frames [1], [6]
C5 - PSNR between original and predicted frames
C6 - average number of steps (A2, A3, A4, A5, A6, A7, A8, A9) for macro-block matching between original and predicted frames
Group 2: Criteria for a whole codec
C7 – average MSE of motion compensation image [1], [6]
C8 - PSNR of motion compensation image
C9 - bit-rate for motion compensation image
C10 – motion vectors bit-rate for a frame (P frame)
For the computer simulation we collected the results of all the criteria. However, to compare the performance of the algorithms, we used only those criteria commonly found in the literature: C2, C5, C7, C9, and C10.
B. Examples of simulation results
Some examples of our simulation results are presented in Table I. The analysis of the results involves the following: (1) database of results; (2) PSNR graphics for each test video sequence (Fig. 2); (3) histograms for bits distribution (Fig. 3); and (4) performance comparisons (Figs. 4, 5, 6 and 7).
Fig. 2. Per frame PSNR for Foreman sequence (SA32) for the ME operation only (left side) and for inter-frame coding with a standard MPEG-2 coder with quantizer step size=1 (right side)Fig. 3. Distribution of bits for inter-frame coding of the Foreman sequence at various bit rates with a standard MPEG-2 coder (SA32)
Fig. 4. Rate-distortion comparison of ME algorithms on Foreman sequence (SA32) with quantizer step size (Q) varying from 1 to 32
Fig. 5. Rate-distortion comparison of ME algorithms on Foreman sequence (SA32) with quantizer step size (Q) varying from 1 to 32 after cutting the edges
Fig. 6. Two-criteria comparison of ME algorithms (SA32) for Foreman (left side) and for Coastguard (right side)
IV. Negative PSNR Peak
A negative PSNR peak is an important criterion for evaluating ME algorithms. In some of our experiments the PSNR had a high negative peak:
- Example: Fig. 2; A2, A3, A5 A7, and A9, frames 20, 21 (peak) and 22.
For this example, the above-mentioned algorithms do not have enough search points around the center or have a wrong stop rule: “Optimal (minimal) point in the center”. The improvement strategy is to use a multi-stage stop rule (by increasing the number of search points in the local search area). For example, algorithm A8 works better in the same situation, because it has a three-stage stop rule with a different search step at each stage.
V. Multi-criteria Analysis of Algorithms
Figures 5 and 6 illustrate multi-criteria comparisons of the algorithms. Multi-criteria selection methods are described in [7], [8], [9] and [10]. For multi-criteria analysis of ME algorithms based on a Pareto approach [9] we use the following criteria C2, C7, C9, and C10. However, for ease of presentation and comprehensibility we use a two-criterion analysis (C2 and C11, see Fig. 6).
C11 is a new criterion composed of the criteria C7, C9 and C10 which were used for R-D performance comparison. It is calculated by the following simple algorithm:
We can compare the R-D performance (see Fig. 4) of ME algorithms by calculating the polygonal area (the polygonal area consists of the R-D performance plot and the point where the Y (upper limit) and X (right limit) coordinates meet) as follows:
1. Cutting the edges of the R-D plots (Fig. 5, upper side) by using the Y and X axis’s upper and right limits:
Yupper=min(avgMSE(Qmax,1:9));
Xright=min(avgBits/Frame(Qmin,1:9)),
where avgMSE and avgBits/Frame are Y and X coordinates of the plots on Fig. 4; Qmin and Qmax are the corresponding minimal and maximal quantizer step sizes; and 1:9 represent all nine ME algorithms.
2. After cutting the edges (finding the corresponding X left and Y bottom limits) for each R-D function we can calculate C11 by calculating the polygonal area (Fig. 5, lower side) which consists of the normalized R-D plot and a point with coordinates Xright, and Yupper.
C11 is minimal for the worst R-D performance and maximal for the best R-D performance. Some results of the C11 calculation are presented in Table I.
TABLE I
Results of Computer Experiments
Sequence / SA / A1 / A2 / A3 / A4 / A5 / A6 / A7 / A8 / A9Hall-
monitor
(CIF) / 8 / PSNR(ME) / 35.82 / 35.79 / 35.78 / 35.79 / 35.79 / 35.78 / 35.78 / 35.79 / 35.79
Speed Up / 1.00 / 8.52 / 10.32 / 18.37 / 11.57 / 27.69 / 12.30 / 13.67 / 17.43
R-D (x105) Performance / 1.868 / 1.866 / 1.868 / 1.868 / 1.871 / 1.872 / 1.867 / 1.872 / 1.873
16 / PSNR(ME) / 35.82 / 35.79 / 35.79 / 35.79 / 35.79 / 35.78 / 35.79 / 35.79 / 35.79
Speed Up / 1.00 / 32.17 / 49.79 / 96.82 / 61.09 / 172.5 / 73.21 / 86.09 / 102.9
R-D (x105) Performance / 1.860 / 1.859 / 1.863 / 1.863 / 1.866 / 1.867 / 1.864 / 1.868 / 1.868
32 / PSNR(ME) / 35.82 / 35.79 / 35.79 / 35.80 / 35.79 / 35.78 / 35.79 / 35.79 / 35.79
Speed Up / 1.00 / 95.2 / 153.4 / 345.2 / 224.4 / 633.6 / 268.9 / 316.2 / 378.3
R-D (x105) Performance / 1.855 / 1.857 / 1.861 / 1.861 / 1.865 / 1.866 / 1.863 / 1.866 / 1.867
Coast-
guard
(CIF) / 8 / PSNR(ME) / 29.57 / 29.52 / 29.52 / 29.37 / 29.53 / 29.36 / 29.51 / 29.51 / 29.52
Speed Up / 1.00 / 11.29 / 16.38 / 9.86 / 14.39 / 22.76 / 14.92 / 17.27 / 19.85
R-D (x106) Performance / 4.052 / 4.039 / 4.04 / 3.966 / 4.041 / 4.596 / 4.035 / 4.041 / 4.041
16 / PSNR(ME) / 29.57 / 28.98 / 29.13 / 28.85 / 29.53 / 29.36 / 29.52 / 29.52 / 29.53
Speed Up / 1.00 / 32.16 / 49.73 / 31.17 / 53.97 / 85.50 / 55.80 / 57.40 / 74.44
R-D (x106) Performance / 4.045 / 3.969 / 3.994 / 3.898 / 4.035 / 4.007 / 4.029 / 4.033 / 4.034
32 / PSNR(ME) / 29.58 / 28.70 / 28.88 / 28.58 / 29.53 / 29.36 / 29.52 / 29.52 / 29.53
Speed Up / 1.00 / 95.2 / 153.3 / 98.8 / 198.2 / 314.1 / 205.0 / 210.9 / 273.4
R-D (x106) Performance / 4.034 / 3.919 / 3.944 / 3.849 / 4.023 / 3.995 / 4.018 / 4.022 / 4.022
Foreman
(CIF) / 8 / PSNR(ME) / 33.84 / 33.05 / 32.86 / 33.38 / 33.26 / 33.18 / 32.54 / 33.00 / 33.00
Speed Up / 1.00 / 11.27 / 16.38 / 14.3 / 14.71 / 24.14 / 16.25 / 17.15 / 19.44
R-D (x105) Performance / 5.894 / 4.858 / 5.014 / 5.422 / 5.192 / 5.474 / 4.737 / 5.221 / 5.163
16 / PSNR(ME) / 33.97 / 32.98 / 32.89 / 33.33 / 33.39 / 33.35 / 33.03 / 32.99 / 33.20
Speed Up / 1.00 / 32.10 / 49.74 / 47.62 / 52.99 / 86.82 / 54.75 / 57.84 / 69.34
R-D (x105) Performance / 5.528 / 4.354 / 4.591 / 4.976 / 4.888 / 5.151 / 4.634 / 4.801 / 4.867
32 / PSNR(ME) / 33.99 / 32.72 / 32.69 / 33.07 / 33.39 / 33.35 / 33.07 / 33.08 / 33.21
Speed Up / 1.00 / 95.0 / 153.3 / 157.24 / 194.50 / 318.3 / 199.0 / 210.90 / 254.03
R-D (x105) Performance / 5.4493 / 4.1342 / 4.4404 / 4.7847 / 4.8094 / 5.0729 / 4.5773 / 4.7373 / 4.7928
Table
Tennis
(CIF) / 8 / PSNR(ME) / 31.45 / 30.88 / 30.71 / 30.24 / 30.62 / 30.07 / 30.49 / 30.67 / 30.41
Speed Up / 1.00 / 11.26 / 16.35 / 21.42 / 15.77 / 35.68 / 18.74 / 21.08 / 23.94
R-D (x106) Performance / 1.145 / 1.076 / 1.077 / 1.121 / 1.126 / 1.154 / 1.043 / 1.139 / 1.132
16 / PSNR(ME) / 32.23 / 31.19 / 31.10 / 30.37 / 31.07 / 30.50 / 31.34 / 31.31 / 30.93
Speed Up / 1.00 / 32.06 / 49.64 / 74.92 / 58.38 / 131.1 / 65.73 / 74.41 / 87.88
R-D (x106) Performance / 1.028 / 0.906 / 0.916 / 0.98 / 1.013 / 1.036 / 0.954 / 1.019 / 1.016
32 / PSNR(ME) / 32.87 / 30.56 / 30.59 / 29.76 / 31.20 / 30.54 / 31.59 / 31.56 / 31.01
Speed Up / 1.00 / 94.95 / 153.10 / 258.3 / 213.85 / 479.41 / 237.96 / 270.92 / 321.2
R-D (x105) Performance / 9.046 / 7.488 / 7.637 / 8.326 / 8.902 / 9.128 / 8.337 / 8.96 / 8.932
A. Basic Evaluation Strategy
Table II contains Pareto-efficient points for the case “search area 32”. Our basic evaluation strategy is as follows:
- Algorithm A6 corresponds to the Pareto-efficient point in four cases of frame sequences (the best one).
- Choose the best algorithm:
Let {1,…,i,…,K} be a set of examples (frame sequences), and {A1,…,Ai,…,An} be a set of algorithms. For j we perform a computer experiment and select Pareto-efficient points: Pj {A1,…,Ai,…,An}. For j we consider the weight parameter αj [0,1]. Now we can use the following coefficient:
(1)
TABLE II
Pareto-efficient points (without A1), (SA32)
Sequence / Pareto-Efficient Points / Near Pareto-Efficient PointsHall-monitor / A6, A9 / A8
Coastguard / A6, A9 / A5, A7, A8
Foreman / A6 / A5, A8, A9
Table tennis / A6 / A8, A9
A resultant priority for algorithm (i) is the following:
(2)
Thus, an algorithm with maximal priority will be the best one (Table III).
TABLE IIi
Total priority of algorithms
Algorithm\ Sequence / 1 / 2 / 3 / 4 / Total priorityA1 / - / - / - / -
A2 / 0 / 0 / 0 / 0 / 0
A3 / 0 / 0 / 0 / 0 / 0
A4 / 0 / 0 / 0 / 0 / 0
A5 / 0 / 0 / 0 / 0 / 0
A6 / 1 / 1 / 1 / 1 / 1
A7 / 0 / 0 / 0 / 0 / 0
A8 / 0 / 0 / 0 / 0 / 0
A9 / 1 / 1 / 0 / 0 / 0.5
B. Evaluation Strategy with Constraints
In practical situations (e.g., real-time implementation) for choosing a set of preferred algorithms it is reasonable to use some constraints on the parameters:
- Example 1: Speed Up > X (e.g., X=200 for “Table tennis” SA32).
- Example 2: R-D Performance > Y (e.g., Y=105 for “Table tennis” and SA32).
As a result, only a part of the initial set of algorithms will be selected. Thus, such constraints can augment the basic evaluation strategy.
C. Evaluation Strategy with near Pareto-Efficient Points
Here we take into account some algorithms which are very close to Pareto-efficient points. Table II displays a set of near Pareto-efficient algorithms for each video sequence benchmark. Thus, in this strategy the selected algorithms are Pareto-efficient points (for some benchmarks) and near Pareto-efficient points (for other benchmarks):
- Example: A9 (Pareto-efficient point for benchmarks 1 and 2; near Pareto-efficient point for benchmarks 3 and 4).
Fig. 7. Framework of student projects
As it's seen from table I, it's very difficult to perform the correct analy8is without using the integrated parameter. It can cause us to get faulty results. It can be explained by the fact that parameter PSNR doesn't take into account bit rate, while algorithm A6 significantly reduces the bit rate of the motion vectors (see figure 3). From the table I it's seen that almost all the algorithms are better than A6 algorithm by PSNR criteria. Comparison using the integrated R-D performance criteria prevents this situation, but in case the number of parameters (set of test sequences, set of algorithms, set combination of control parameter and set of algorithm criterions) grows, the complexity of the algorithm analysis will significantly grow as well. Comparison using the integrated normalization criterions (see vii) will drastically decrease complexity of algorithm analysis. By doing this, we'll get unspecified parameters values (coefficients) varying from 0 to 1, while 1 is the maximum parameter value. We can get to the situation when the best algorithm for all parameters will be the algorithm that wasn't the best for any parameter (i.e. was always on the 2nd, 3rd or other place). The final integrated parameter will be selected by multiplication of all the unspecified parameters and choosing the largest one.
- Basic Evaluation Strateg
- y
Figure 7 and table IV illustrate performance comparisons of the algorithms. Our basic evaluation strategy is as follows:
Let: 1,..,i,..,R: be a set of test sequences (frame sequences), {Ai,...,Aj,...,Am} be a set ofal~~orithms, {B1,..., B1,...,Bp} set of combination of control parameters and { C i ,. .., C k ,. .., C } be a set of criterions for algorithm (j) and frame sequence (i)
ijl iil " !il
and set of combination of control parameters (I). For \7" Aj we perform a computer experiment and define set of criterions. For V' C I, we detine new unspecified normalized criterion ( C k ) using the following formula:
'JI
Where: O < C k ::; I ,"
The total combined performance of algorithm (j), frame sequence (i) and combination 4)f control parameters (I) is the following:
pij(= fI CCJ( (4) !II kc
"k=1 Where: O < Pill::; I
The average performance of algorithm (j) and combination of control parameters (I) for all frarne sequences is the following:
1 R .(5)
PJ = -~ PJ J
RL.. !I i=1
Where: (J < Pil ::; I
The average performance of algorithm (j) for all combination of control parameters for all frame sequences is the following:
1 p R (6)
p = -~ ~ Pi1 J
RpL..L.. J 1=1 i=i
511
1
~
..
~ ""
.
Where: 0 < Pi::;: I
We use the integrated evaluation parameter (CIJ for performance analysis of ME alg<Jlrithms based on a performance evaluation strategy. Cl2 is a new criterion composed of the criterions C2 and Cl1 (see the previt)us paragraph). It is calculated by applying the basic evaluation strategy described above to the ME analysis basis:
For (0 EMBED Equation.3 0 0 0 we define a new unspecified normalized criterion (0 EMBED Equation.3 0 0 O ) using the tollowing tormula:
O EMBED Equation.3 D O D (7)
Where: 0 < c-;- ::;: 1 -,,'
For';/ C II ", we define a new unspecified normalized criterion ( ~ ) using the following forrrlula:
C (8 C = 11iji Iliji 9
max
UC11
yi j=1
Where: 0 <~::;: I ",
The total combined performance of algorithm (j}, frame sequence (i} and combination ~)f control parameters (I) is the tollowing:
C = C .C (9 12 iji 2iji II iji
Where: 0 < Cl2 ::;: I "I
The a verage performance of algorithm (j} and combination of control parameters (I) for all frame sequences (4} is the tollowing:
C -~ 4 C (10) 12Ji-4t:112iji
Where: 0< C12ii ::;: I
The average performance of algorithm (j} for all combination of control parameters (3: SA=8, SA= 16 and SA=32} for all frame sequences is the following:
1 3 4 (11} C = -"""' """' C
12 j 12 7:0; 7:0; 12 iji
Where:O<Cl., ::;:1 .,
512
VI. Student projects
Small student teams have conducted the following research projects (Fig. 7): (a) an analysis of algorithms for multimedia communication systems; (b) preparation of database blocks for the algorithms; (c) design of algorithms; (d) computer simulation and test results; and (e) comparison of algorithms on the basis of test results and discussion.
Note that (e) was organized as a joint project for all student teams.
VII. Conclusions
We have described a new computer environment for evaluation of ME algorithms for video compression. We also examined multi-criteria analysis of algorithms for video compression. The new integrated parameter (C11) and a Pareto approach for performance comparison can be used in many other situations (e.g., R-D and complexity-distortion (C-D) codec controls for decision making in ME operations). Our work is only a first step in multi-criteria analysis. Many other algorithms for multimedia can be investigated on this basis.
Future research directions are the following:
- In-depth studies of various kinds of test video sequences and integration of the corresponding results.
- Multi-criteria methods for algorithm evaluation and decision making.
- Evaluation of numerous other algorithms.
- Realization of our computer environment in SIMULINK.
- Implementation of our algorithm evaluation environment in DSP.
- Continuation of our experiments in educational processes.
(1) Acknowledgment
The authors thank Dr. Mark Sh. Levin, Dr. Shlomo Greenberg and Mr. Oleg Kruchkov for their useful comments.
References
[1]P. Kuhn, Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation, Kluwer Academic Publishers, Boston, 1999, pp. 17–50.
[2]VQEG (Video Quality Expert Group)
[3]MATLAB
[4]K.-K. Ma, P. I. Housr, and L. Huang,”Status Report of Core Experiment on Fast Block-Matching Motion Estimation”, ISO/IEC JTC1/SC29/WG11, MPEG98/M3299, Tokyo, March 1998
[5]D. Turaga and M. Alkanhal. (1998, Spring). Search Algorithms for Block-Matching in Motion Estimation (Mid-Term project, 18-899) [Online]. Available at
http://www.ece.cmu.edu/~ee899/project/deepak_mid.htm
[6]C. Stiller and J. Konrad “Estimation Motion in Image Sequences: a tutorial of modeling and computation for 2D motion”, IEEE Signal Processing Magazine, July 1999.
[7]P.C. Fischburn, Utility Theory for Decision Making, New York, Wiley, 1970.
[8]R.L. Keeney and H.Raiffa, Decision with Multiple Objectives: Preferences and Value Tradeoffs, New York, Willey, 1976.
[9]V. Pareto, General Notion of Economic Equilibrium. Manual of Political economics, 1971, English translation by A.S. Schwier, McMillan, New York, (first published in 1896)
[10]R.E. Steuer, Multiple Criteria Optimization: Theory, Computation, and Application, New York, Willey, 1986.
Evgeny Kaminsky received the B.Sc. and the M.Sc. degree in Electrical and Computer Engineering from Ben-Gurion University in 1995 and 1999, respectively. He worked as a VLSI Engineer with VisionTech Ltd (now BroadCom Israel) in the area of MPEG-2, and with Intel Israel in the area of Pentium-4 manufacturing testing. He is presently a Ph.D. student at the Department of Electrical and Computer Engineering at Ben-Gurion University. Mr. Kaminsky is interested in video information processing, algorithms for image compression, and VLSI design.