DGA/DET/LRBA/LNA 20/09/2004

Visual pose estimation of a small helicopter

Benoît RENAUD*

We present a method based on vision in order to estimate the pose (i.e. the attitude and position) of an object in space. Our algorithm uses an input in the form of video sequence, from which we extract 2D coordinates of characteristic points in the image. Knowing the 3D coordinates of these points in the frame of the object and their 2D image, we are able to reconstruct the pose of the object. Our method relies on known algebraic transformations between the 3D object and the image, as well as on distortion models. We test our method on practical cases for which we show the validity of the approach.

1

Introduction

The goal of the present technique is to provide an independent way to measure the pose of a solid in space. The algorithm uses a video camera sequence as input and a characteristic points embedded solid. It permits to estimate the pose (i.e. position and attitude) of the embedded solid without drift and with good accuracy.

The characteristic points we used are colored balls. As shown in Figure 1, the target to be detected and tracked in order to estimate solid's pose is constituted by five balls forming a cross. With knowledge of the cross geometry and its projection on the image plane, the pose can be estimated if the camera is calibrated. The first step in the visual pose estimation process is therefore the camera calibration.

A software has been developed to achieve this task, based on known algorithms such as blob tracking. This work was based on a computer vision open source C++ library called OpenCV [1].

*Research intern

LRBA/LNA (Laboratoire de Recherches Balistiques et Aérodynamiques)

BP 914

27200 Vernon Cedex

Tel: 02.27.24.

Figure 1: cross tracked by the method in the current implementation.

Camera Calibration

The camera must be calibrated in order to conform the pinhole camera model that is, in order to get a simple relation between the 2D image coordinates on the screen and the 3D coordinates in space. These 2D and 3D coordinates are related by :

(1)

and are the focal length in pixels by the axis X and Y, and is the image coordinates of the optical center (intersection between the optical axis and the image plane, theoretically the middle of the image).

The calibration method used is Zhang's auto-calibration method [2]. In OpenCV library, lens distortions are modeled with four corrections terms , , , as follows :

are the ideal, (i.e. distortion-free) image physical coordinates and , are real, that is, distorted image physical coordinates,. Without distortion, , in our case, we had and

Detection

In the application developed, the characteristic points to detect and track must be coplanar, but the attitude and the position can also be estimated from 3D known shape image projection.

The cross shape is interesting because the intersection of the straight lines is invariant by projection on the camera image plane. If 5 points are found, they correspond to the cross if one of the points is the intersection of two segments, whose vertices are the four other. As soon as this criterion is validated, the tracking is enabled.

On each cross extremity, a colored ball was fixed and points of the cross are detected as color blobs in the image. As long as color blobs are well detected, the center of gravity of the blob corresponds to the projection of the cross extremity with an error inversely dependent on the size of the blob. The bigger the ball blob, the more accurate the center of mass estimation.

The color detection is made in HSV color space. The result is a black and white image where white pixels have hue in a defined interval corresponding to the desired color and saturation and value above a defined threshold.

A constraint with a color based method is that none of the colors to detect must be visible in the environment. To prevent from wrong color blobs that would be found in the environment, the background is subtracted.

Tracking

The tracking is achieved by recognition of a defined color in the neighborhood of a previous position. A color histogram of each ball of the cross is modeled as soon as the cross is detected; it is then stored for tracking phase.

Thus, each modeled histogram is used as a look-up table to generate probability distribution mapping (one map for each ball tracked). In this probability image, pixels are given a gray level proportional to the probability that the real color belongs to the modeled histogram. A meanshift is computed to locate the center of gravity of the white pixels. The geometric relation of the cross is verified to validate one tracking step.

Pose estimation

Coordinates of each ball of the cross are defined in the cross frame. These coordinates are user-defined and always stand the same. Following (eq. 1), the complete relation between the 2D image coordinates of the balls projected and their 3D straight lines coordinates is :

(2)

,

Where are extrinsic parameters : the rotation matrix and translation vector relate the world coordinate system to the camera coordinate system. This knowledge provides us with full pose information (attitude and position).

With knowledge of the 2D image coordinates and the 3D cross theoretical coordinates, and intrinsic parameters of the camera, only rotation and translation are unknown. The target used to estimate the pose is planar. Thus, each z value of 3D coordinates points of the cross is 0. Equation 1 becomes :

parameters can be estimated rising a least squares method using data of four points or more.

Performance

Tests were conducted in the laboratory in order to measure the error on the attitude estimation. The first sequence is a full turn of the cross around its center (see Figure 2). The full rotation is decomposed in four rotations of 90 degrees (see the four plateaus in the plot of Figure 3 and 7).

Figure 2 : cross in first test sequence.

The first rotation matrix links the camera to the cross. Every future rotation matrix is left-multiplied by the inverse of this initial matrix. Thus, only attitude moves are studied; in our experimentation, these moves are composed by rotation around a fixed axis. In this first case, the axis is z and the rotation matrix takes on the form :

The angle between the first position of the cross and the current position is . The curve below show the evolution of this angle :

Figure 3 : variation of in the first test sequence.

The synchronization was not assumed and all the information we have on the rotation is that it turns 4 times around z axis by 90° to perform a full turn. The speed during the rotating phases is constant.

Therefore to a piecewise affine fit of this curve was performed to separate the rotating phases from the stationary phases. Intersections of the affine fits have been computed and theoretical angle rotation was modeled as linear functions passing by these intersections.

Figure 4 : Linear interpolation lines (green & red) and theoretical angle variation (blue).

The absolute error was computed as the difference between the variation of angle estimation and the theoretical variation :

Figure 5 : absolute error between estimation and the theoretical variation.

It shows an absolute error varying between zero to one degree when the cross has made an half turn. This one degree error is due to the fact that the z axis of the cross does not exactly correspond to the z axis of the table, around witch the rotation is made.

The corrected error correspond to the difference between the affine fit and the variation of the measured angle is shown below :

Figure 6 : error between estimation and the linear piecewise regression.

We assume then an error of 0.3 degrees at most. The peaks on the curve above are an artifact due to the transition between the static phases and the rotation phases in the piecewise regression.

The second sequence is a 90° turn of the cross around itself, from front-view to side-view. Figure 6 shows the cross at half rotation process.

Figure 6 : cross in second test sequence.

In the second sequence, errors are about the same, except at the beginning and at the end of the sequence. At the beginning because it is the worse case in terms of angle estimation because of low relative movements when the cross is seen frontally and begins to turn. At the end because the cross is side view and the red ball half disappear.

The angle made by the cross with respect to the initial position is computed as for the first sequence. The curve obtained is shown in Figure 7.

Figure 7: variation of in the 2nd second test sequence.

The absolute error was compute in the same way as before :

Figure 8 : absolute error between estimation and the theoretical variation.

At the end, the error increases significantly but as explained, the reason is the bad visibility of the balls. Errors at the beginning are almost 1 degree because of the low relative movements of the cross (worst configuration to estimate the cross movements).

Figure 9 below shows the difference between the affine fit and the variation of the measured angle :

Figure 9 : variation of estimated (blue) and of theoretical (red).

In this configuration, without considering the beginning and the end of the variation, the maximum error is 0.8 degree.

The attitude and position errors depend on the ratio size of object/distance object-camera and on the focal length mostly. The main source of errors is the incertitude on the cross projection image coordinates witch is modeled here after.

Error estimation

To compute attitude and position from 2D image coordinates and 3D coordinates, equation (2) must be solved. In homogeneous coordinates, equation (2) is equivalent to:

The five point of the cross are coplanar, thus , and we can write :

Which is equivalently denoted in short by . F is invertible and relation above is equivalent to :

M is written in vector form to find its component using a least square estimation : with

and

The system to solve is composed of 5 time the relation above for the 5 points of the cross.

with a slight abuse of notation, we now denote U the concatenation of the five previously obtained vectorsU. denotes the least square pseudo inverse of A

In order to measure the precision obtain on Euler angles we modelled the problem as follows :

For each configuration (a set of Euler angles), the difference between theoretical Euler angles and its estimation is computed for a pixel error varying between 0 and 9 pixels. The error is added as a pixel shift either up right, down right, up left or down left (see Figure 10 below).

Figure 10 : the five configuration of pixel error added to a point of the cross.

An error of 9 pixels is a worse case scenario, the typical pixel error on centre estimation is 3 pixels for colored balls of 20 pixels wide, as shown in figure 11 :

Figure 11 : Illustration of pixels error.

For the 5 points of the cross, the error is added in a way to have every possible error configuration (5^4 5 points and 4 different pixel error shift), and for all theses error configurations, maximum and mean of the error is measured. Thus, for each Φ, θ, ψ configuration, , a set of five runs is performed, in order to have a statistical sample of what the error is.

For the tests, arbitrary values have been taken : - size of image : 640x480

- focal length : 1100 px

- distortions : zero

- distance from camera : 4m

Cases studied (results in appendix 1) :

Configuration : roll=0, pitch=0, yaw=0

Configuration : roll=45, pitch=0, yaw=0

Configuration : roll=0, pitch=90, yaw=45

Configuration : roll=45, pitch=45, yaw=45

The four cases above correspond to relevant choices of the Euler angles. For each of the cases, the appendix shows the image of on the screen of the camera as well as a curve showing the error as a function of the number of pixels. For each of the scenarios, two curves are displayed : the maximum absolute value error (infinite norm) and mean error in absolute value.

As can be seen, the method shows excellent robustness to disturbs in the image, as the maximal error observed during all the trials is on the order of 1.5 degrees at most for 9 pixels of error, that was never observed in pratice.

Conclusion

The method proposed is an alternative to complex pose measurements of airborne solids. In practice, results of the method described assume an error of 0.8 degrees at most with our test bed. Theoretical error estimation confirms the robustness of the algorithm to pixels errors. Euler angle error linearly vary with pixels errors and its maximum is 0.5 degrees for 3 pixels of error.

References

[1] : OpenCV (Open Computer Vision library).

[2] : Zhengyou ZHANG. A flexible new Technique for Camera Calibration,1998

APPENDIX 1

Roll=0, pitch=0, yaw=0

Roll=0, pitch=45, yaw=0

Roll=0, pitch=90, yaw=45

Roll=45, pitch=45, yaw=45

1