Object Recognition
Every day we recognize a multitude of familiar and novel objects. We do this with little effort, despite the fact that these objects may vary somewhat in form, color, texture, etc. Objects are recognized from many different vantage points (from the front, side, or back), in many different places, and in different sizes. Objects can even be recognized when they are partially obstructed from view.
While it may be obvious that people are capable of recognizing objects under many variations in conditions, it has been thought that pigeons may not possess the same range of capabilities. It has been proposed that pigeons act as "perceptrons," by analyzing simple features of objects and using those features to recognize objects. If the pigeon were a perceptron, then it would not be able to recognize an object that varied slightly in form or was seen from a novel viewpoint because the features would be altered. Moreover, a pigeon would be unable to discriminate between two objects that contained the same features, but with a different organization.
This chapter addresses a number of fundamental issues relating to object recognition, concentrating particularly on an avian species, the pigeon. The task is to determine whether the basic process of object recognition in pigeons is at all similar to the most probable process that has been proposed for humans. In order to demonstrate the conditions under which object recognition may or may not occur, a number of illustrated examples will be provided.
I. Introduction
This section presents general-level background information, discusses key theoretical concepts, and provides a short statement of the significant findings of the specific experiments. More detailed descriptions can be found in the following sections, to which links are provided throughout. Readers who are well-versed in the basics of object recognition may wish to proceed directly to the "Experiments" section.
One view of object recognition in pigeons
Cerella (1986) proposed that pigeons recognize objects via "particulate perception." That is, pigeons perceive only local features of objects and use those features to recognize specific patterns. He based these conclusions on the results from a series of investigations which indicated that pigeons were responding to local features only.
In one experiment, Cerella (1980) trained pigeons to discriminate intact drawings of Charlie Brown
from normal drawings of other Peanuts characters. Then, Cerella reorganized Charlie Brown by altering the relations between the head, torso, and legs. He discovered that the pigeons responded to scrambled versions of Charlie Brown in the same manner as the original, intact drawings. Therefore, he concluded that the pigeon must be insensitive to global organizational properties of objects. Insensitivity to global object properties is one attribute of a particulate perceiver.
How could a particulate perceiver survive in the world? A particulate perceiver would have to rely entirely on differences in local features in order to discriminate and classify objects. Emergent properties of objects such as overall form, spatial organization, and three-dimensional structure would not have an impact on perception. It is difficult to imagine how an organism that flies about in the world could successfully navigate without using any information about the spatial organization of the surrounding environment. The pigeon actually possesses two perceptual systems: (1) a long-range guidance system; and (2) a shorter-range (food) detection system. It is possible that a particulate mechanism may operate on the grain-seeking system, which is invoked when closer range, smaller objects are being detected and identified. Perhaps, the long-range guidance system does make use of global organizational properties of the surrounding environment.
If the near foveal system of the pigeon does operate using particulate features, then it is possible that the mechanisms of avian visual perception differ substantially from the mechanisms of human visual perception. The avian visual system does differ significantly in the underlying neuroanatomy/ neurophysiology compared to the primate visual system. However, there are analogous pathways and structures. It seems somewhat premature to accept the unparsimonious assumption that the avian (near foveal) visual system differs vastly from our own in terms of the mechanisms of object recognition.
In order to address the differences in mechanisms offered by Cerella's Particulate feature theory (PFT) and theories of human object recognition, I will first describe a prominent account of object recognition in humans, consider its predictions and compare them to the predictions of Particulate feature theory, and then present a series of experiments designed to specifically address any differences in the predictions of the two theories.
A theory of object recognition in humans
Recognition-by-components (RBC; Biederman, 1987) is a theory of object recognition in humans that accounts for the successful identification of objects despite changes in the size or orientation of the image. Moreover, RBC explains how moderately occluded or degraded images, as well as novel examples of objects, are successfully recognized by the visual system.
The major contribution of RBC is the proposal that the visual system extracts geons (or geometric
ions) and uses them to identify objects. Geons are simple volumes such as cubes, spheres, cylinders, and wedges. RBC proposes that representations of objects are stored in the brain as structural descriptions. A structural description contains a specification of the object’s geons and their interrelations (e.g., the cube is above the cylinder). A perceived object is analyzed by the visual system, which parses the object into its constituent geons. Then, the interrelations are determined, which include aspects such as relative location and size (e.g., the lamp shade is left-of, below, and larger-than the fixture). The geons and interrelations of the perceived object are matched against stored structural descriptions. If a reasonably good match is found, then successful object recognition will occur. The RBC view of object recognition is analogous to speech perception. A small set of phonemes are combined using organizational rules to produce millions of different words. In RBC, the geons serve as phonemes and the spatial interrelations serve as organizational rules. Biederman (1987) estimated that as few as 36 geons could produce millions of unique objects.
RBC was developed to account for primal recognition of objects; primal recognition is fast-acting and does not utilize higher-level cognitive processes. Higher-level processing may involve the use of shading, texture, or color in finer discriminations of objects. Additional top-down processing may also occur when environmental cues such as context are used to identify particularly difficult instances of objects (e.g., a pencil would be easier to recognize if it was partially occluded by a stack of papers on a desk than a pile of leaves in the yard).
Theoretical Predictions
There are three major facts of object recognition in humans that are predicted correctly by RBC, but are at odds with Particulate feature theory. Next sections review these predictions for humans and the subsequent colored tables highlight recent results examining these issues with pigeons.
1. The correct spatial organization is essential for picture recognition in humans(Biederman, 1972; Biederman, Glass, & Stacey, 1973; Biederman, Rabinowitz, Glass, & Stacey, 1974). Because RBC is based on the assumption that a small set of geons are the basis for millions of objects, organizational rules must play a large role in object recognition. It is possible to have different objects made up of the same parts, so discriminating between those objects necessarily involves a sensitivity to spatial interrelations. This prediction of RBC stands in greatest contrast to Particulate feature theory, because PFT predicts no role for spatial organization.
If pigeons recognize objects using local features alone, then variations in the arrangement of those features would have little or no impact on the accuracy of recognition. Thus, unlike humans, the pigeon would be incapable of discriminating between the cup and the pail. The cup and the pail are comprised of two components: a cylinder and a curved handle. However, the orientation and position of the handle relative to the cylinder differs between the objects. In order to discriminate the cup from the pail, one must be able to recognize the differences in the organization of the components, a more global property of objects. Several experiments by Kirkpatrick-Steger, Wasserman, and Biederman have demonstrated that pigeons can discriminate changes in spatial organization, and that spatial organization plays a key role in picture recognition in pigeons.There is, however, one difference in the local features of the cup and pail -- the points of contact (intersections) between the handle and cylinder differ slightly. If pigeons were attentive to fine variations in local features (as PFT argues), then the differences in contact pionts could prove sufficient in differentiating between these objects. Kirkpatrick-Steger, Wasserman, and Biederman (1998) ruled out the contribution of the contact points as a significant contributor to object recognition in pigeons.
2. If a subset of only two or three geons are available and they are in the correct spatial organization, then successful object recognition will occur. RBC predicts this result because object recognition does not require an exact match between the perceived object and stored structural description. In contrast, PFT predicts a detrimental effect of deletion of parts, because the parts are the only means available for recognizing the object.
Biederman, Ju, and Clapper (1985) presented objects lacking some of their components. Human participants correctly identified objects when only 2 or 3 components were available, but not when only 1 component was presented. It is easy to identify the sailboat when only one of the sails is missing. One could also imagine that the hull and mast alone might produce moderately accurate recognition, but it is unlikely that the sailboat would be identified from the mast alone. PFT would predict that the loss of any components leads to a detriment in recognition accuracy. Kirkpatrick-Steger, Wasserman, and Biederman (1998) discovered that pigeons could recognize objects at high levels of accuracy when three of four components were available, but not when only one component was present. They also discovered that some components were recognized better than others, a result that is also consistent with research using human participants.3. Object recognition in humans is largely invariant with regard to changes in the size, position, and viewpoint of the object. The visual information falling on the retina when a particular object is viewed varies drastically from occasion to occasion, depending on the distance from the image (which affects the size of the image on the retina), the vantage point from which the object is viewed, and the location of the object relative to the viewer (which affects the part of the retina that is stimulated). One of the most fundamental and essential properties of the visual system is the ability to recognize a particular object, despite great variations in the images that impose on the retina. RBC accounts for all three types of invariances. Invariances in viewpoint (rotational invariance) provide the greatest challenge to PFT.
Rotational Invariance: People are capable of recognizing objects from many different vantage points, even views that have never before been seen (BiedermanGerhardstein, 1993). Notice that some views of the airplane involve the display of different parts than others. If pigeons recognized simple features alone, then rotational invariance would not occur. Therefore, a model of object recognition, such as PFT, that relies on local features alone would predict that rotational invariance would not be observed. A series of experiments by Wasserman et al. (1996) demonstrated substantial, but not complete, rotational invariance in pigeons.Size Invariance:Objects can be recognized despite variations in actual or apparent size. Because the size of an object, such as the sailboat, does not change the structural description of an object (the geons and their spatial organization), RBC predicts that recognition should be size invariant. Kirkpatrick-Steger and Wasserman (unpublished data) demonstrated generalization of responding to sizes on either size (smaller or large) of a training size in pigeons, but there was a generalization decrement at extreme sizes. Successful recognition of objects by pigeons, despite changes in size, further suggests that the mechanism of object recognition in the pigeon is similar to the mechanism of object recognition in humans. However, the finding does not discriminate between RBC and PFT, because PFT also predicts size invariance.
Translational Invariance: When an object is moved to a new position in the
environment, a different portion of the retina is stimulated. Nonetheless, modest changes in position do not disrupt recognition accuracy in human subjects; that is, object recognition is translationally invariant. Translational invariance indicates that people do not learn to recognize an object on the basis of the absolute position in the environment or its position relative to other objects (e.g., the desk is right of the bookshelf). Kirkpatrick-Steger, Wasserman, & Biederman (1998) discovered that pigeons performed at high levels of accuracy when an object, such as the Watering Can, was displayed in a new position on the viewing screen. Thus, object recognition in both both pigeons and humans appears to be translationally invariant. However, successful translational invariance does not discriminate between RBC and PFT because the object features (and their organization) are unchanged.
Conclusions
There are many similarities in the properties of object recognition in pigeons and humans, suggesting that similar mechanisms may be employed. For example, both pigeons and humans are sensitive to object components and their spatial organization. However, not all of the components must be present in order for successful recognition to occur; only a subset of two or three components are needed, provided that they appear in the correct spatial organization. This explains why object recognition can occur even when objects are partially occluded. Finally, there is good evidence for rotational, size, and translational invariances in both pigeons and people. These broad similarities suggest that a common theory may be applied in explaining object recognition in both species. PFT clearly cannot account for the pattern of results. RBC does correctly predict all of the major findings, but other theories, such as the new generation of template models offer similar predictions. Further experiments will undoubtedly be needed in order to differentiate between rival theories.