Jerry,

Sorry: I have been having trouble getting my brain awake and in gear. So I keep writing blather. But we are going out soon so I will send you what I have. I may be seeing a doctor on Monday in your area so if you like I can stop by (briefly because I still have to get this ready for Tuesday’s class).

Assume that an Indexis the name of an object-token. Having atoken of the name is like having the name of some variable in a computer – it enables you to access its value or whatever is the named thing. Of course the usual thing in a computer is that you access some symbol in the machine, but that needn’t be the case. I see no reason why given the name of an object the computer could not turn its video camera to it. In that case it might have to have the object’s cylindrical coordinates, focal distance and brightness, but that’s all part of the backstage (subpersonal, modular) machinery. But that isn’t how I want to think of it.

Given the name of an object (e.g., o) I have been assuming that the visual system can do such things as move its attention to it without moving its gaze. In some sense this is asking the mind to do something to some non-mental thing and it’s true that this is usually not possible and it may even be true that Joe Levine says it can’t be done directly but has to be mediated by a mental representation. So be it. Butit still remains the case that you can accomplish the function of switching your attention to a named object in your field of view(either by continuous movement, like many people believe, or without moving through intermediate locations, as happens in Sperling’s recent model).

One of the main assumptions about FINSTs is that these indexes allow the evaluation of predicates such as Red(x) or Beside(x,y) where x and y are indexes that are bound to and thereby refer to two distinct objects in the field of view (why 2 objects and not two views of one object? Good question and one I had worried about because it requires additional bells and whistles, but it does not require a miracle. I think it shows either than index binding occurs later in the visua module – after distinct views are computed – or it occurs early and cannot select views at all, only different objects. Given that we are dealing with preconceptual processes the latter makes more sense. More on thatlater (maybe).

There was also the question of whether you can interrogate an object knowing only its name (or the name of its FINST). That question, it seems to me, is one clouded in unnecessary mystery. The way you do that is just the way you direct your attention (and not your gaze) to an object of interest. You do it in all cases by doing something to the proximal stimulus which we assume maps any relevant properties of the distal stimulus of interest. It’s true that you see distal objects and not other links in the causal chain. But that does not mean you could not affect your perception of the distal object by doing something to its proximal link in the chain. You could, for example, break the chain by turning off an intermediate link, thereby preventing perception of the relevant distal object or making it less salient. Similarly, you could check whether the cells at the proximal stimulus are able activate a P detector, etc.

In Seeing and Visualizing I suggest a connectionist-type model for combining the detection of a property with a circuit that inhibits all but one region of the proximal stimulus and thereby detects the presence of a property at a particular object, where the object is not identified by its location but by a name that simply identifies it as one of the objects that caused an index to be assigned. The network is not smart butit reveals that there is no mystery in detecting P-at-Object-O. The name of the object is available only temporarily or it can be made to track the object so long as the object moves continuously (so the tracking through occlusion requires some additional mechanism or assumptions) by a technique suggested by Koch & Ullman – make the circuit enhance locations on the proximal stimulus that are near the currently selected one.

In case you are interested I provide several levels of detail of the network (including a circuit at the end and the Appendix of Chapter 5 of Seeing and Visualizing). A summary is that the model shows that it is possibletoineffect send a signal (a pulse) to a distal object by manipulating its current proximal projection (you are right that you could not send a signal to the distal object – the best you can do is ‘open/close your eyes’ to it). In this network the following happens. A set of transducers is activated by incoming information. Each point on the proximal stimulus has lots of punctate transducers. A transduceris partially (subthreshold) activatedby the presence of some feature, say, feature F. What will happen eventually is that a signal will be sent to selected transducers on the proximal stimulus. If that transducer is partially activated it will go over threshold and fire. Since all transducers of the particular type (which detect a particular feature) are connected to a logical disjunction network, the output of the mechanism will be the information that there is F somewhere on the proximal stimulus. What makes it possible to ask this question of a particular location on the proximal stimulus is the Winner-Take-All (WTA) network that shuts down all transducers except the one that has the highest activity. Putting aside the details, it does it in a manner familiar to connectionist network builders; by using a WTA network that settles into just one node (or one path through the network), being active at the end of a series of adjustments known as a “relaxation”. Each place in the proximal stimulus (call that a proximal place or PP) thus has several links going to it: one from each property transducer at that location, one from the WTA network which ensures that only the most active (the most salient) PP is able to respond – the other PP units having been shut down.The PP also receives one input from a selected feature inquiry channel (depicted in the attached figure as a button for each property). The end result is that you can open the channel to one PP by pulsing it with a signal corresponding to the property you want to check on, so you can find out whether the currently most active object (or some other activated/indexed object) has property P. You can also find out other properties that it has because you have an open channel to that object. This simple picture does not deal with multiple objects or moving objects, but they can be fit into the same general picture with additional mechanisms of the same general type.