3.Chapter 3. Selection: the Key to Linking Representations and Things

Pylyshyn: Nicod LecturesLecture 2

Selecting and Locating

3.Chapter 3. Selection: The key to linking representations and things

3.1.Selection: The role of selective attention

3.1.1.Allocating and shifting attention: The role of objects and places <SLIDES 4- 18>

3.1.2.Attention selects and adheres to objects

3.2.More on what is selected by FINSTs

3.2.1.Causes and codes

3.2.2.Conceptual, nonconceptual and quasi-representational contents

3.3.The relevance of the present research to understanding what happens in early vision <SLIDE 27-28>

3.3.1.Nonconceptual representation and Feature Placing

1)Feature placing and the binding problem.

2)Feature-placing and the causal link

3)Feature-placing and nonconceptual access

3.3.2.What do FINSTs select? Some consequences of this view

3.4.Summary

Chapter 3. Selection: The key to linking representations and things

3.1.Selection: The role of selective attention

We have been discussing the connection between the world we perceive and mental representations. This topic has a way of returning again and again to the notion of selection. Selection is central topic in contemporary cognitive science and, as we shall see, it is also the place where empirical cognitive science comes in contact with a series of problems in the philosophy of mind that are of concern in this book. Selection enters the empirical discipline via the study of what has been called focal attention. From our perspective the study of attention should also be where we find clues about what the visual system picks out for further analysis. Clearly focal attention and what I have been calling Indexing are very similar. On the face of it the major difference would appear to be that we have several (perhaps 4 or 5) Indexes that work independently not only to select but also provide a reference to things in the world. Perhaps if we examine the experimental literature on visual attention we may find there some evidence about what sorts of things can be selected and also what the selection is for.

The general view in psychology is that attention is the mechanism by which cognition is able to make selections from among the various aspects of the perceived world, and that the ability to make selections is at the very core of how we interact with the world [Patrick Cavanagh refers to attention as the mechanism that “exports vision to the mind”, see \Cavanagh, 1999 #955]. This, however, leaves a great deal unsaid and raises questions that are at the heart of our present concern.

(1) The first question that the notion of selection raises is: Why? Why should the mind select and what role does selection play? The usual, and probably universally accepted answer, is that we must select simply because our capacity to take in information is limited. Being incapable of taking in everything, we must perforce select and we do so by applying what Donald Broadbent, one of the founders of modern information processing psychology, described as a filter[Broadbent, 1958 #779]. That the mind is limited and therefore has to be selective is unquestionably true, but far from being the whole story about the function of focal attention. (Even the part of the story that it correctly points to is highly incomplete. If the mind is limited, along what dimensions is it limited? And if it has to select, on what basis and along what dimensions does it – or can it –select?)

(2) It has also become clear that selection is needed not only to keep relatively unimportant or irrelevant information from clogging the mind, but it is also needed for reasons that have nothing to do with the mind’s having a limited capacity. It would be needed even if we were like the Martians in Heinlein’s cult science fiction novel Strangers in a Strange Land who could “grok” the entire perceptible world in one swallow. We would need it because in order to analyze and encode certain properties of the world we have to distinguish some parts of the visible scene from other parts; in particular, as Gestalt Psychologists pointed out in the last century, we must distinguish a focal figure from a background, or distinguish between a this and a not-this. Since perception makes such figure-ground distinctions for moving things as well as stationery ones,it implies that more than just selection is occurring, it implies that perception identifies the thing selected as an enduring individual independent of its instantaneous location. This, in turn suggests that there is a mechanism in perception that allows us refer to thingsin some way and keep track of their continuing identity. Thus focal attention may be thought of as a mechanism by which we pick out and refer to things we perceive [as \Campbell, 2003 #1630, argued]. FINST theory postulates a generalization of focal attention to multiple things (although FINST indexes are different from focal attention in a number of important ways). As we saw earlier, we need to pick out several things at once in order to detect patterns among them. Thus the need to individuate and refer to things provides a second reason why we have to select items and why focal attention is a central concern in any discussion of how the mind connects with the world through perception. But there is yet another reason why we have to select certain sorts of things with attention – and indeed why what we select has to be things rather than places.

(3) The third reason we need selection has been explored both in the experimental psychology literature and in the philosophical literature. It is the fact that properties – or what finds expression as predicates – come in certain sorts of bundles or groups. The question of how our perceptual system manages decode these bundles of properties has come to be called the binding problem[a term associated with the work of Anne Treisman, e.g., see the review in \Treisman, 1988 #483]. When properties are properties of the same thing or the same sensory individual they must be marked somehow as conjoined properties, not merely as properties present in the scene. The earliest stages of vision cannot simply report the presence of properties. It must, in addition, in some way preserve the information that groups of properties belong to the same sensory individual, so that we can distinguish between, say, a scene containing a green circle and a red square from a scene containing a red circle and a green square, or between a large striped animal coming towards us from a large striped animal going away from us or between a small furry animal coming towards us and large furry animal going away from us, and all combinations of these features. Indeed, perception must provide the information in a form that enables us to distinguish between a green object that goes quack and a red object that goes moo, so the requirement holds across different modalities. The problem of providing this information in the right sorts of bundles, which is called the binding problem[or which \Jackson, 1997 #1660, called the "many properties problem" ],is crucial for our survival as well as being important in understanding how vision connects with the world. Although I reserve discussion of this problem until I examine Austen Clark’s analysis in section 3.3, I mention it here because we will see that the solution involves object-based selection in a crucial manner.

3.1.1.Allocating and shifting attention: The role of objects and places<SLIDES 4- 18>

In recent years experimental psychologists have distinguished two ways in which attention can be allocated in a visual scene. One way, referred to as exogenous attention allocation, depends on events in the world – it is said to be data-driven. This form of attention allocation begins with an event in the visual scene thatcaptures attention automatically without the intervention of volition or of cognition more generally. Some event – most notably the appearance of a new object in the visual field – captures attention [though other sorts of events, such as a sudden change in luminance, will do as well \Franconeri, 2005 #1687]. The other way of allocating attention occurs when you are searching for something and sweep your attention around a scene. It is called endogenous or voluntary attention allocation. An early demonstration of both types of attention switching is illustrated in Figure 31. Attention that has been allocated by these two means differs in a number of subtle but important ways. Exogenous or automatic attention allocation is the more important form of attention shift. It is more rapid and reaches a higher level of facilitation than attention that is moved voluntarily. Voluntary shifts of attention are easily disrupted by the automatic exogenous pull of visual events occurring at the same time [Rauschenberger, 2004 #1652;Mueller, 1989 #1334].

Figure 31. Illustration of how attention may be exogenously captured by a sudden brightening, shown in the second panel.Performance on a detection task (third panel) is better at the cued location than at the uncued location and at intermediate times is also better at a location along the path between fixation and where attention is moving to. For endogenous attention movement the subject is told (or shown by the appearance of an arrow at the fixation point) which direction to move attention. [Posner, 1980 #199]

An important findingcomes from a series of experiments and a mathematical analysis by [Sperling, 1995 #1404]. These authors have made a persuasive empirical case that at least in the case of automatically shifted attention, the locus of attention does not actually move continuously through space. The apparent movement may instead arise because of the distribution of activation over timeand space when attention is captured and switches from one thing to another. According to this analysis, the apparent moving pattern may arise because the degree of attentional activation gradually decreases at its starting location and gradually increases at its target location. When these two spatiotemporal distributions are summed at intermediate locations it results in an apparently moving activation-maximum. Because voluntary shifts typically do not have a target event to which they are automatically drawn, it is possible that they may sweep through intermediate positions. Although the data are not univocal on this question, it is plausible that when you move your attention voluntarily in a certain direction, you may sweep the locus of your visual attention through intermediate empty locations (so-called “analogue movement” of attention). I would caution against taking this as proven, however, since Jonathan Cohen and I showed that when people try to move their attention through an empty region in the dark (extrapolating the motion of a visible object that disappears behind an occluding surface) they probably do not move their focal attention through a continuous sequence of empty locations [Pylyshyn, 1999 #999]. This conclusion is based on the finding that they perform poorly at continuouslytracking with their attention the location where the invisible object is at any moment. Given a series of visible anchor points along the invisible trajectory they do much better. Thus we concluded that their attention normally jumps from one visible feature to another using their highly precise Time-to-Contact estimation skill[for more on the latter, see \Tresillian, 1995 #1037]. This conclusion is also supported by experiments by [Gilden, 1995 #1038]showing that when a subject tries to track the imaginedcontinuous motion of an invisible object, the temporal pattern one gets in an adaptation experiment (when the imagined motion crosses an area adapted to motion) the transit times are consistent with the hypothesis that the moving attention consists ofa series of static episodes rather than of a continuous movement[1] So movement of attention may not be continuous even in the case of endogenously or voluntarily generated movement of attention

3.1.2.Attention selects and adheres to objects

A great deal of research in the past twenty years has convinced psychologists that viewing selective attention as being location-based is either incorrect or at the very least a secondary part of the story of attention allocation. An increasing number of studies have concluded that we attend to what I have been calling things (and what the psychological literature refers to as “objects”) rather than empty places. Evidence for this came initially from demonstrations of what is called single-object advantage. When a pair of judgments is made, the judgments are faster when they pertain to the same perceptual individual, even when all geometrical properties are controlled for, as shown for example by the experiment illustrated in Figure 32.

Figure 32. Task used in [Baylis, 1993 #882] to illustrate single-object superiority. The task was to judge whether the left or the right vertex was higher. Judgments made when the vertexes were seen as part of a single figure were faster than when the vertexes were perceived as belonging to the same figure as opposed to two different figures. In subsequent studies [Baylis, 1994 #1045] the effect of other stimulus properties (such as convexity) was ruled out.

There is also evidence that the effect of attention tends to spread from where it is initially attracted to cover an entire larger object that subtends that initial attractor. For example, when attention is attracted to the highlighted end of a bar it then spreads to the entire bar. This spread does not simply terminate on an edge, but proceeds through what is perceived as the entire bar even if the entire bar is not explicitly indicated by a contour but is created by an illusory process called “amodal completion”, as show in Figure 33. Thus it is the bar as a perceptual whole object that determines how attention spreads.

Figure 33. When attention is drawn to one end of a bar (marked A) by a cue (e.g., the brightening of the contour, indicated here by the dotted lines), its effect can be observed at the other end by the faster detection of a probe at that location (marked B), while the equally distant location on another bar (marked C) is not enhanced. This is true so long as A and B are perceived to be on the same bar (which they are in panels 1-3, but not in panel 4) [Adapted from \Moore, 1998 #1398].

Even more persuasive are studies showing that attention moves with objects being attended. A variety of phenomena of attention – including attentional enhancement (in the form of priming) and the negative effect of attention on the ability to re-attend to the same thing a short time later – show that attention appears to stick with objects rather than remaining at the attended location. The first case is illustrated by studies in [Kahneman, 1992 #827] which show what is referred to as Object-Specific-Priming-Benefit (OSPB). In such studies the time to make a judgment about the identity of a pattern, such as a letter, is shortened when the letter occurs in the same object (typically a box frame) in which it had previously occurred [there is even evidence that such priming may last up to 8 seconds, \Noles, 2005 #1634]. This phenomenon has also been found with MOT experiments of the sort described in Chapter 2. The second case is illustratedbythe phenomenon called Inhibition of Return, wherein attention is slow to return to something that was attended some 300 to 900 milliseconds earlier. Figure 34 illustratesan experiment showing that Inhibition of Return appears to move with the formerly attended object, rather than affecting the place that had been attended.

Figure 34. When attention is captured exogenously by an object and then disengaged, it takes more time to re-attend that object (as measured, say, by how well they can detect a dim spot on that object). This Inhibition of Return appears to move with the attended object [Tipper, 1991 #939].

There has also been a suggestion in some of these studies that location may also be inhibited in this case[Tipper, 1994 #1065]. But the locations in question in these dissenting findings are typically either nonempty or they are in a simple relation to indexed objects (e.g., halfway between two objects). It is doubtful that a location in a uniform unstructured field [what is called the ganzefeld, \Avant, 1965 #968] can be selected or inhibited – indeed after a few minutes of staring into a ganzefeld people tend to get disoriented and cannot even look back to some location they had just attended. Without objects in the field of view attention appears to be unable to get a grip on an empty location [failure to find inhibition at empty locations was also reported recently using Multiple Object Tracking, \Pylyshyn, 2006 #1619].

Visual attention is a much-studied phenomenon and a great deal of evidence is available, of which I have presented only a few illustrative examples. From our perspective what is important is that there is considerable evidence that sensory objects attract and maintain focal attention and that the evidence for the more common-sense notion that attention can be allocated to empty spaces is far from being univocal.

3.2.More on what is selected by FINSTs

In Chapters 1 and 2 I presented an outline of a theory of visual indexing (sometimes called the FINST theory). According to this theory, things in the world capture (or as we sometimes say, grab) one of a small number of available FINST Indexes, which thereafter are available for referring to the things that were the cause of the capturing. In describing matters in this way I emphasize the role of an index as a reference mechanism. Indexes act like pointers in computer data structures: they provide a reference to some sensory individual (the nature of which has yet to be specified) without themselves serving as a code for any property of the individual that is indexed.

The parallel between pointers in a computer and FINST indexes is quite exact and helps to clarify what is being claimed, so it deserves a brief aside. The terms pointer or address are misleading in that they both connote a location. But in fact neither refers to an actual location. When we use the term pointer we are referring to a different level of abstractness from that of physical locations. Moreover, a pointer does not pick out to a location in any kind of space, in even the most extended sense, including some sort of “functional space” since such a “space” does not have the minimal properties of a metric space – as I will argue in the next chapter. A better way to view a pointer is as a name or a singular term. But the question still remains: what does such a singular term refer to? I claim that it refers to what I have been calling things, meaning sensory individuals, or visual objects. Not places where things are located, but individual things themselves. At the end of this chapter I will return to the question that people often ask: Why don’t I simply claim that they refer to physical objects? And why do I insist that the indexes do not select things by pointing to places where things are located? More on this presently.