This Page Is for Online Indexing Purposes and Should Not Be Included in Your Printed Version

Send your completed paper to Sandy Rutter at by 13 April 2007 to be included in the ASABE Online Technical Library.

Please have Word's AutoFormat features turned OFF and do not include live hyperlinks. Your paper should be no longer than 12 pages. For general information on writing style, please see

This page is for online indexing purposes and should not be included in your printed version.

Author

First Name / Surname / Role / Email
Francisco / Rovira-Más / ASABE Member Engineer /

Affiliation

Organization / Address / Country
PolytechnicUniversity of Valencia / Cno. Vera s/n. 46022 Valencia / Spain

Author

First Name / Surname / Role / Email
Qi / Wang / ASABE Preprofessional

Affiliation

Organization / Address / Country
University of Illinois at Urbana-Champaign / 1304 W. Penn. Ave,
Urbana, IL61801 / USA

Author

First Name / Surname / Role / Email
Qin / Zhang / ASABE Member Engineer /

Affiliation

Organization / Address / Country
University of Illinois at Urbana-Champaign / 1304 W. Penn. Ave,
Urbana, IL61801 / USA

Publication Information

Pub ID / Pub Date
073126 / 2007 ASABE Annual Meeting Paper

The authors are solely responsible for the content of this technical presentation. The technical presentation does not necessarily reflect the official position of the American Society of Agricultural and Biological Engineers (ASABE), and its printing and distribution does not constitute an endorsement of views which may be expressed. Technical presentations are not subject to the formal peer review process by ASABE editorial committees; therefore, they are not to be presented as refereed publications. Citation of this work should state that it is from an ASABE meeting paper. EXAMPLE: Author's Last Name, Initials. 2007. Title of Presentation. ASABE Paper No. 07xxxx. St. Joseph, Mich.: ASABE. For information about securing permission to reprint or reproduce a technical presentation, please contact ASABE at or 269-429-0300 (2950 Niles Road, St. Joseph, MI49085-9659USA).

An ASABE Meeting Presentation

Paper Number: 073126

Design of Stereo Perception Systems for Automation of Off-road Vehicles

Francisco Rovira-Más

PolytechnicUniversity of Valencia, Camino de Vera s/n, 46022 Valencia, Spain, .

Qi Wang

University of Illinois at Urbana-Champaign, 1304 W. Penn. Ave, Urbana, IL61801,USA.

Qin Zhang

University of Illinois at Urbana-Champaign, 1304 W. Penn. Ave, Urbana, IL61801,USA.

Written for presentation at the

2007 ASABE Annual International Meeting

Sponsored by ASABE

MinneapolisConvention Center

Minneapolis, Minnesota

17 - 20 June 2007

Abstract.Localization and safeguarding are two basic functions that need to be integrated in intelligent machines. Global Navigation Satellite Systems (GNSS) such as GPS suffer from reliability problems caused by a lack of consistency in the signal that results in a necessity for sensor redundancy in vehicle localization. Perception, on the other hand, cannot be achieved through satellite systems and requires the aid of local sensors such as cameras, lasers or ultrasonics. Consequently, localization and safeguarding both benefit from local positioning and short-range perception. Stereo vision is becoming a preferred option to fulfill that mission due to the advantage of providing the three dimensional visualization of the targeted scene. However, it possesses the disadvantage of difficult data processing and handling, together with the absence of a systematic procedure to determine the most efficient system architecture. Because of the novelty of commercial solutions, there are no clear directions in terms of assembling stereo perception engines. The objective of this paper is to provide a set of recommendations on the configuration of stereo-based perception systems to assist intelligent off-road machines. To do so, several experiments were conducted in which different combinations of camera-lens-range were studied. Results showed that,in real-time applications, ranges up to 15 meters can be sensed with acceptable accuracyusing off-the-shelf compact binocular cameras.

Keywords.Machine vision, Stereo vision, Binocular cameras, Off-road machines, Vehicle perception and localization, Three-dimensional point cloud.

Introduction

Several years have passed since the first stereo vision systems were implemented in the perception engines of agricultural robots, but there are still no clear directions in terms of design and system architecture. Computers are improving in speed and memory capacity, camera prices are stable or even decreasing, potential is still growing, but, how good is stereo when compared to other perception sensors? What can, specifically, be sensed with stereo cameras? What problems are expected? Many questions remain unanswered, and therefore, research on this topic is welcomed and justified. Machine vision entered the arena of perceptive sensors in the eighties and today there are many automatic solutions which rely on it. Stereoscopic vision, by contrast, became popular only a decade ago with the release of compactbinocular cameras. Three companies pioneered the incipient market, and not only they constructed stereo rigs, but also made available pixel-correlation software that helped to extend this technology, as stereo software is applicable to customizedstereo cameras as well. The principal advantage of stereo over conventional monocular vision is the availability of the third dimension or range. Stereo is considered the most appropriate sensor to generate three-dimensional (3D) representations of reality in real time. It is possible, however, to generate 3D point clouds with nodding lasers and ultrasonic devices, but the need for scanning through the entire field of view implies a loss in speed and simultaneity of data acquisition. The registration of the 3D point cloud is not a problem nowadays because it has been facilitated by commercial systems, but the interpretation and handling of such information for real-time applications is still an issue to be investigated. Poor textures and correlation mismatches result in noisy outcomes that make difficult the correct extraction of 3D information. However, the advantages of stereo are still numerous and a reasonable command of stereo data is feasible with nowadays technology.

Several applications of stereo vision have been developed during the last decade, both in agricultural engineering and in all other branches of technology that use robotics and artificial intelligence. Agriculture is itself a very heterogeneous science, and so are, obviously, its technological needs. A generic procedure to deal with stereo data was proposed by Rovira-Más et al. (2006a) where the concepts of 3D density and density grids provided a means to cope with massive point clouds. Three-dimensional mapping, an operation of growing interest in precision agriculture, was carried out through stereo perception (Rovira-Más et al., 2005). In spite of high requirements in processing speed and memory capacity, sample maps were assembled and such technique points at a promising future in local mapping. Vehicle automation, on the other hand, largely benefits from vision applications that imply a direct action on the intelligent machine, that is, when reactive commands are derived from processing stereo outputs. In that regard, one of the high-potential uses of stereo perception is automatic steering of off-road vehicles. Initially, any agricultural vehicle can be implemented with such a system based on stereo, and proof of that can be found, for example, in the automatization of a wheel-type tractor (Kise et al., 2005; Rovira-Más et al., 2004), or the detection of the crop edge to guide a corn harvester (Rovira-Más et al., 2006b). The desired jump from prototype status to commercial exploitation has been prevented, in the majority of the cases if not in all, by the lack of reliability on the tested systems and the consequent hazard to nearby people, buildings or machines. Potential failures in the vision, electronic or control systems of the vehicles have been, and still continue, slowing down the process of incorporation of stereo to industry. Paradoxically, stereo can also contribute to the vehicle’s security system through the safeguarding capabilities that 3D perception can offer, as demonstrated by Wei et al. (2005) in the obstacle detection system mounted on an agricultural tractor as a safetyfeature. Many stereo-based perception systems have been reported outside agriculture. Planetary roversusually have to cope with off-road terrains that somewhat resemble those encountered by farm vehicles, consequently, stereo systems have also been implemented on rovers to assist autonomous navigation and safeguarding, typical applications where the availability of the distance camera-object results essential (Olson et al., 2003). Similar situations are faced by defense vehicles, and DARPA’s Grand Challenge provided a rich hotbed to test stereo systems supplying perception information to fully autonomous off-road vehicles (Bebel et al., 2006; Jones et al., 2006). Humanoids (Sabe et al., 2004) and other robotics applications (Khamene and Negahdaripour, 1999) also support stereo vision, but the environments where they are designed to navigate move away from the scenarios typically found by agricultural machines, and therefore, lose interest for this research.

The objective of this paper is to provide a set of recommendations to assist in the design of the stereo-based perception units of off-road vehicles performing agricultural tasks. The information searched includes the proper combination of physical parameters such as lenses and baseline and the assessment of coordinate accuracy in the three Cartesian dimensions. A series of experiments were planed to find out various design suggestions, although many alternatives are still to be studied.

System Architecture

Two different camera bodies with interchangeable C-mount lenses were used along the experiments. Both compact rigs are commercial all-digital stereo cameras made by Videre Design (Palo Alto, California). The cameras are equipped with 2/3” imagers and use the IEEE-1394 bus for direct digital input. The main difference between both cameras is their baseline; in the short baseline rig, the lenses are separated 9 cm, whereas the separation between lenses is 23 cm for the long baseline camera. Two different pairs of lenses were mounted on the cameras: 8 mm focal length lenses and 12.5 mm focal length lenses. Figure 1 shows a schematic diagram of the system architecture. The computer has to perform several tasks: first, it needs to power the camera through the IEEE-1394 cable; second, the left and right images have to be acquired and, if necessary, saved in the hard disk; third, the correlation software provided by Videre Design (SRI Small Vision System ©) generates the disparity map, which can also be saved, and holds the key information to create the 3D point cloud; and fourth, a customized program, written in either C++ or Matlab, transforms the 3D cloud to a user-defined system of coordinates and saves the arrays of data for further processing and analysis. Also included in the diagram of Fig. 1, although they are not part of the system architecture, are some parameters relevant to the experimental phase. The composite field of view is, as represented in Fig. 1, the amalgamation of the fields of view of each lens, and therefore will be wider (larger horizontal angle) for the long baseline stereo camera. Each particular combination camera-lens will determine a minimum detectable range, which establishes the shortest distance that the camera can sense. Long baselines, due to the need for image overlapping, will produce bigger minimum ranges. Lenses with large focal lengths will also result in higher values for the shortest range. The maximum detectable range is, in theory, set at infinity. However, accuracy drops significantly as ranges increase and therefore, rather than the maximum range, a useful parameter to handle is the maximum reliable range. The maximum reliable range is the range at which the target objects are detectable with enough definition to be recognized under a low probability of error. It depends on the size of the objects to be identified, the baseline, the focal length of the lenses mounted on the camera, and the resolution of the images acquired. The theoretical range is the distance from the plane where the digital imagers of the camera are located to the (average) plane where the target objects have been placed for the experiments, as specified in the Design of Experiments Section. The theoretical range is, by definition, bigger than the minimum detectable range and smaller than the maximum reliable range. The perception computer was carried by a utility vehicle and powered by additional marine batteries.

Figure 1. Architecturefor stereo perception system.

Design of Experiments

The principal idea behind this research was to study how several configurations of a basicstereo system perceived an especially-built target.The experiments were designed to address such specific questions as: What ranges are feasible with commercial binocular cameras? How can the camera be adjusted(baseline, lenses) to any particular needs? What accuracy can be expected? Are commercial solutions reliable enough for intelligent vehicles? Is sensor redundancy an improvement or a need? The three parameters that were combined to generate all the configurations studied were the baseline, the focal length of the lenses, and the theoretical range. In order to be able to compare all the combinations fairly, the set of target objects had to be always the same while the other parameters changed according to Table I. Since the accuracy in the perception of distances was a crucial element of analysis, nine objects were arranged in a regular matrix of dimensions 3 x 3, where the objects were placed in a plane parallel to the camera’s imagers. Figure 2 shows the position of the objects and the relevantgeometry involved in the experimental setup. The objects were empty cubes whose sides measured onefoot, made of cardboard and with adhesive tiles glued on the sides to create texture. The three columns of cubes were supported by fishing lines attached to a motorized garage door, whose control permitted to set the right height of the objects according to the diagram of Fig. 2b. The photography of Fig. 2a shows a front view of the object matrix and the placement of the camera during the tests. Figure 2c provides a top view of the setup where the three positions selected for the camera are indicated by P1, P2, and P3. The ground of the yard where the tests were carried out was made of concrete and had an irregular slope that had to be estimated and taken into account during the experiments. In particular, there was a difference in the ground levels of 7 cm for P1, 37 cm for P2, and 41 cm for P3, taking the ground level under the object matrix as the reference. The location of the points P1, P2, and P3 was below the reference ground. The inclination angle of the camera was always 0º, that is, the camera’s imagers were always perpendicular to the ground. The theoretical distances given by P1, P2, and P3 represent the critical space that needs to be sensed in the surroundings of an intelligent off-road vehicle. Table 1 lists the characteristics of the performed tests.

Figure 2. Design of experiments: a) Experimental setup; b) Front view schematic diagram of the experimental setup; and c) Top view diagram of the experimental setup.

Table 1. Experiments specifications based on the combination of basic camera parameters.

Test / Baseline (cm) / Focal length (mm) / Position / Tripod Height (m) / Real Height (m)
1 / 23 / 12.5 / P1 / 1.07 / 1.00
2 / 23 / 12.5 / P2 / 1.37 / 1.00
3 / 23 / 12.5 / P3 / 1.56 / 1.15
4 / 23 / 8 / P1 / 1.22 / 1.15
5 / 23 / 8 / P2 / 1.37 / 1.00
6 / 23 / 8 / P3 / 1.41 / 1.00
7 / 9 / 12.5 / P1 / 1.1 / 1.03
8 / 9 / 12.5 / P2 / 1.12 / 0.75
9 / 9 / 12.5 / P3 / 0.91 / 0.50
10 / 9 / 8 / P3 / 1.01 / 0.60
11 / 9 / 8 / P2 / 1.17 / 0.80
12 / 9 / 8 / P1 / 1.12 / 1.05

Results

For every test specified in Table 1, the three dimensional point cloud was registered and properly transformed to the Cartesian coordinates drawn in Fig.3. A visual inspection of the cloud gives a rough idea of the perception quality: the nine elements of the matrix must be noticeable, regularly spaced, contained in a plane parallel to the image plane, and centered with respect to the camera positions P1, P2, and P3. However, there is a need for a more quantitative evaluation of the results, and therefore several parameters were defined and measured. The theoretical rangeR, as marked in Fig. 1, is the actual distance between the camera’s image plane and the average horizontal plane containing the target objects. Taking into account that the objects that form the target matrix are cubes of 0.3 m side (one foot), the theoretical ranges Rfor the three camera positions considered are 5.15 m, 10.15 m, and 15.15 m. Another parameter to check is the matrix center offset (Moff) which indicates how far the vertical symmetry axis of the matrix lays with respect to its expected position along the line P1-P3 (Fig. 2c). The three-dimensional view of Fig. 3 shows the matrix center offset for test 1. According to Fig. 2b, the dimensions of the matrix in terms of longitude are 2.3 m height (H) and 2.3 m width (W). The corresponding dimensions deduced from the 3D cloud will give an estimate of how accurate the camera perceived the target object in height and width. The measured matrix height was indicated by H’, and similarly, W’ stored the values for the measured matrix width. These two parameters constitute the planar properties of the cloud. Although these are fundamental data for perceptual tasks, the main advantage of stereo is brought by the availability of the range, that is, the distance from the camera to the target object. The measured rangeR’ was calculated through Equation 1, after estimating the distance from the camera (image plane) to the front side of the boxes (front sidebox rangeFB) and the distance from the camera to the back side of the boxes (back side box range BB).