Shan-Hsiang Shen
An enhanced estimation: motion and rotation estimation
Shan-Hsiang Shen
- Introduction
There are many kinds of video encode method used nowadays such as H.264 and MPEG. Internet has been well developed and is a widely used platform to spread video out. Millions of people exchange video such as on youtube. Thus, how to encode video efficiently is an important topic.
The goal of video encode is to make video file smaller and make it looks like original one. To make file smaller, video compression is used.
There are many components included in video encoding such as Discrete Cosine Transform, Quantization, and Huffman Coding. This project will focus on motion estimation.
- Related work
An important idea of video compression is to avoid representing the same or similar information again and again. For example, in most cast, a video frame will not change a lot compared to adjacent frames. If we can store these similar parts only one time, a lot of storage will be saved.
However, how to match the similar parts between two or more frames? It is why motion estimation is developed. The similar part among frames may move. For example, there is a video about a moving car. The car may move from the left of a frame to right. The body of car will be the similar part of some frames but it moves.
Therefore, the goal of motion estimation is to detect how these similar parts move. A motion vector is defined to represent the movement. It has a format (x, y). x is the distance (how many pixels) it moves in x direction, while y is the distance in y direction.
To simplify the problem, a video frame will be divided into several blocks. Each block will be processed independently. For each block called current block, we try to match it with some similar parts of other frames called reference frames. The similar here is decided by sum of absolute differences (SAD). The lower SAD means more similar between two blocks.
To save computation time, we define a range in a reference frame called search range. We only try to match the current block with some part in this search range. For each current block, we need to scan all pixels in a search range and compute the SAD between the current block and pixels we scan as the figure above.
Figure 1
The blue square is a current block and bigger black border is a searching range. The most left-top pixel of a current block will trace from the most top-left of a search range to the most right-bottom pixel by pixel.
For each pixel it stays, pixels difference will be summed up as SAD. Each pixel difference will be defined as following equation.
Diff(pixel A, pixel B) = | A. red – B. red| + |A. green – B. green| + |A. blue – B. blue|
After scanning over all area of a searching range, the smallest SAD will be picked up and a motion vector will be calculate. It is a vector from the absolute position of a current block in current frame to the position of area matched in a searching range.
After motion estimation, video encoding method will drop current frames and only keep reference frames as well as motion vectors. To recover the current frames, it can be pasted from reference frames according to motion vectors.
3 Motion and rotation estimation
3.1 Basic idea
As described above, the goal motion estimation is to find the similar part in searching range for each block. The quality of compressed video will be affected by how similar it can match. If we can find totally match, the compressed video will look the same as original video. However, it is very hard to reach. All we can do is find more similar match as we can. More similar means lower SAD.
The object in a video may not only move but also rotate. For example, a ball rolls from the left of a frame to the right. In this situation, if we can match each block with different angle. More similar match can be figured out. This new method is called motion and rotation estimation.
The algorithm of motion and rotation estimation is similar to motion estimation. For each block, it will also scan all pixels in searching range. However, in each position of searching range, the current block rotates in different angles to calculate SAD for each angle. Thus, there will be more candidate matches we can choose and the one we pick up will better than that by motion estimation.
3.2 Image rotation method
There are bunch of rotation method used. There are three widely used methods:
Assigning source pixels to destination pixels
Assigning destination pixels from source pixels
Assigning destination pixels from source pixels with bilinear interpolation
The first method has serious assigned pixel problem. Because we use sin and cos to map source pixels to destination pixels, so pixels in destination will not be assigned values.
The second and the third methods are similar, but the third method use bilinear interpolation to make image more smooth. Thus, we consider the third method in this project.
3.3 The angle to be tested
Figure 2
Figure 2 shows how a current block rotates. It rotates around the center of it counterclockwise. For each angle, a SAD will be calculated.
However, the number of angles and which angle should be rotated will be the next issue.
We can see the rotation as the most left-top pixel move along a circle as showed in Figure 3.
Figure 3
For example, it can be move from the point a on the circle to the point b. The angle is just the angle of a->c and b-c, where c is the center of the circle. To decide which angle the block should rotate, we pick up some points on the circle which will be the point the most top-left pixel of the block should move to.
Because the frame is formed be pixels discretely, we can see a frame as a pixel grid. We scan from the first row of pixel to the last one. For each row, we calculate two points which lie on the circle. The following equation is used.
Where is the position of the center of the circle, and
Given a y corresponding to each row, we can calculate one or two x’s values. These will be one or two points on the circle.
3.4 Pixel mapping
For a particular position and angle, each pixel in a current block should be mapped to a pixel in a searching range. This subsection will discuss how to map pixels.
According to the last subsection, the angle can be calculated. Given this angle we can use the equation below to map pixels.
Where is a position in a current block, is a corresponding position in a searching range, and θis the rotation angle counterclockwise. Depending on this equation, we can find a corresponding pixel for a pixel in a current block to compare RGB value which is needed in SAD.
As motion estimation, the lowest SAD is chosen and motion/rotation vector will be calculated. The format is different from motion vector.
(x, y, angle)
An extra member of a vector which is angle is added to show how a current block rotates to get corresponding reference pixels.
4 Evaluation
4.1 Evaluator
To evaluation motion and rotation estimation algorithm, I write an evaluator using JAVA. JAVA does not support some multi-media such as video directly. Thus, Java Media Framework (JMF) is installed. JMF can read in a video from a file as well as write it out.
Figure 4
Figure 4 shows the three main components which are video reader, motion and rotation estimation, and video writer.
Video reader: it reads in a video file and divides the video into many frames. We can do any process on these frames. This component will call the method included in JMF. JMF can support following video format.
.avi (Cinepak, MJPEG, RGB, YUV, VCM)
.mpg (MPEG – 1)
.mov(Cinepak, H.261, H.263 , JPEG, RGB)
(JMF supports audio, too)
Motion and rotation estimation: I implement the main algorithm in this component. Original motion estimation is also implemented here for comparing the result. This component can be seen as video encoder. In this evaluator, it will pass reference frames and motion/rotation vectors to video writer. In addition total SAD for each current frame will be outputted here. It will be an indicator of performance.
Video writer: After receiving reference frames and motion/rotation vectors, video writer will recover all skipped frames and output complete video.
4.2 Evaluation setup
To compare the algorithm, I use a short video to evaluation.
Frame size: 352 X 288
Frame rate: 24 frames/seconds
Data rate: 70084 kbps
Time: 4.24 seconds
This video is in uncompressed .avi format. For both motion estimation and motion/rotation estimation, I set a reference frame per ten frames. Other frames will refer to these reference frames.
The block size is 8 X 8 pixels, and the size of searching range is 16 X 16 pixels. The total SAD of a frame is just the sum of SAD of all blocks in this frame. The total SAD will be use to evaluate.
4.3 Evaluation result
Because it is hard to judge the result got from eyes, I use more reasonable indicator which is Total SAD of each frame and the average SAD. We want to show that motion/rotation estimation can get better match with lower total SAD.
The following figure shows the result.
Figure 5
The x-axis is the sequence number of frames, while y-axis is the corresponding Total SAD for each frame.
It shows that motion/rotation estimation can get equal or better match compared to motion estimation in all time. It works much better in the peak of total SAD. It means it gets higher performance when difference level between a current frame and a reference is high. It can find some match that motion estimation cannot find.
The average SAD for each frame is
Motion estimation: 3812728
Motion/rotation estimation: 3681778
The average SAD per pixel is
Motion estimation: 37.60977
Motion/rotation estimation: 36.31804
The average SAD per color in a pixel is
Motion estimation: 12.53659
Motion/rotation estimation: 12.10601
In average, motion/rotation estimation works better than motion estimation.
5 Feature work
Compared to motion estimation, motion/rotation estimation needs much more computation time, because it needs to calculate much more SAD for each position in a searching range. Therefore, the algorithm can be improved further by skip some computation. For example, it can choose the angle to rotate smaller such as working as binary search.
According to average SAD, motion and rotation estimation does not improve enough. It can be a feature topic.
6 Conclusion
Video encoding technology is improving. How to get more portable and high quality will be a good topic. Motion estimation tries to drop redundant information in a video file and motion/rotation estimation does it further more. The balance among computation time, compress rate, and quality of video is also an important issue.
7 Reference
1 ITU-T Recommendation H.264
2 Variable size block motion estimation
Shckler, D.; Ozturk, Y.; Abut, H.; Signals, Systems & Computers, 1998. Conference Record of the Thirty-Second Asilomar Conference on , Volume: 1 , 1-4 Nov. 1998 Pages:868 - 872 vol.1
3 Adaptive adjustment of the search window for block-matching algorithm with variable block size
Hwang-Seok Oh; Heung-Kyu Lee; Consumer Electronics, IEEE Transactions on , Volume: 44 , Issue: 3 , Aug. 1998 Pages:659 – 666
4 Java media framework (JMF) API
1