Return of the Mental Image: Are There Pictures in the Brain?

Pylyshyn: Pictures in the brain?

Return of the mental image: Are there pictures in the brain?

Zenon PylyshynRutgers Center for Cognitive Science

Abstract

In the past few years there appears to have been a revival of interest in the study of mental imagery. Emboldened by new findings from Neuroscience, some people have returned to the idea that mental imagery involves a special format of thought, one that is more like a picture than a sentence or a logical calculus. But the evidence and the arguments that caused both conceptual and empirical problems for the picture theory in the past 30 years have not gone away and the new evidence does little to justify the recidivist trend.

The format of thought

There is widespread agreement (among both psychologists and laypeople) that thought comes in at least two flavors, pictorial (or visual) and verbal. This idea, which derives from our own very persuasive subjective experience, has been enshrined in what is called the “dual code” view of thinking and memory [1]. Yet neither the claim that thought is expressed in what we experience as visual images nor the claim that it is expressed in what we experience as inner dialogue can be correct because it is easy to show that most (perhaps all) of our thoughts are beyond the reach of our conscious experience.

The words we appear to think with carry very little, if any, of our thoughts, since they presuppose much that is not expressed in what we experience as “inner dialogue”. Just as in our discourse with one another most of what we communicate is unstated, so in our inner dialogue we use rely on ellipses, demonstratives (like “this” or “that”), deixis (like “me” or “now”), indirect references, presuppositions, entailments and so on. I think to myself, “I’d better finish this part of the paper or I will be late for my meeting.” But what is “this part” and how big a part is it? What meeting would I be late for? How late would I have to be for it to count as being “late for a meeting”? And who does “I” refer to? The thought also presupposes that I want to go to some meeting, otherwise I would not have thought about being late for it – though that is unsaid. In our inner dialogue we follow the same rules of discourse that we follow in conversation, where the principle is that we should not state what we believe the hearer already knows. But in inner conversation if we leave things unsaid then the unsaid part of our thoughts must have been in some other format that we did not experience, so at least that part of the thought was actually not expressed in language after all. Similarly when we think in pictures what the pictures represent, how we interpret them and what they mean cannot be found in the pictures themselves. All pictures are deeply ambiguous, As the philosopher Wittgenstein pointed out, a picture of a person walking up a hill is identical to a picture of a person walking down a hill backwards. But our images are never ambiguous since we know what we intend them to represent and this knowledge itself is neither verbal nor pictorial.

More recently people have focused primarily on the pictorial aspect of thought. Even if most of our thoughts are unconscious, it could still be that those that are experienced as visual images might play an important part in our mental life. Famous thinkers are always being quoted as saying that their ideas did not come to them logically but appeared to them in mental pictures [2]. What exactly this means, other than that the person had a vision-like experience, is far from clear. The experimental evidence for the assumption that we think using a picture-like format (or, as it is sometimes referred to, a depictive format) is far from compelling, even if what is being claimed could be made clear. I have argued that the differences between pictorial and other (verbal, logical?) forms of reasoning that are observed in experiments, are more likely due to what our thoughts are about than to the form that these thoughts take.

Discussions of mental imagery very often confound the form (or format) of thoughts from their content, or what they are about. There is clearly a difference between thinking about how something looks and carrying on an inner dialogue about abstract ideas. But this is a difference in topic, much like the difference between discussing the newest fashions in clothing and having a discussion about freedom of the will. This difference, by itself, might well account for such experimental findings as that vision interferes with mental imagery, since one might just as reasonably expect that thinking about some topic (say arithmetic) might be more difficult if one were trying to ignore an irrelevant discussion on the radio on that same topic. Because thinking about how something looks is different in so many ways from other kinds of thoughts (e.g., it focuses strongly on quantitative spatial relations) it does seem reasonable that there might be a different format for visual information than for nonvisual information. Yet if there is something special about the format in which we think when we have the experience of “seeing an image” in our mind, science has not yet revealed what it is, despite some 30 years of very active research.

The picture theory of mental images

Despite the uncertainty about how we might distinguish different formats of thought, there continue to be persistent claims that mental images have a special picture-like format that has been referred to as “depictive”. One of the few explicit statements concerning what this means ([3], pp5), states that a depictive representation is “… a type of picture, which specifies the locations and values of configurations of points in a space. … [in which] each part of an object is represented by a pattern of points, and the spatial relation among these patterns … correspond to the spatial relations among the parts themselves. … not only is the shape of the represented parts immediately available to appropriate processes, but so is the shape of the empty space … one cannot represent a shape in a depictive representation without also specifying a size and orientation….” For obvious reasons I refer to theories like this as “picture theories” since their main emphasis is on the pictorial or iconic aspect of the format of mental images.

Over the past three decades a very large number of experiments have been carried out and cited in support of the picture theory. Among the most widely cited is the finding that it takes longer to scan one’s attention a greater distance on an image [4] (see Box A). Other experiments appear to show that it takes longer to “see” some visual detail in a “small” image than in a “large” image (e.g., it takes less time to report seeing whiskers on a very large image of a mouse than in small image of a mouse). Related experiments show that when people are asked to turn their head and judge from their image when a pair of points would become indistinguishable, or when they are asked to make judgments about the identity of imagine vertical, horizontal or oblique gratings, subjects gave very nearly the same results to those they would have produced if they had been presented with the patterns visually. Other experiments appear to show that people can only remember certain things (e.g., which side of your front door the knob is on) by first recalling an image and “reading off” the result, and that they tend to commit themselves to details that would be gratuitous if they were not constructing a picture-like representation (e.g., when asked to imagine a printed word, you tend to select either upper case or lower case letters and not something indeterminate). There are also a very large number of studies that examine whether entertaining an image impairs or enhances performance on certain visual tasks, on visuo-motor tasks, on recall tasks, on adaptation tasks, and so on. In each case researchers concluded that the results supported the assumption that images are pictorial entities that are examined with the visual system [see the extensive review in 5,6].

Although some version of the picture theory of mental imagery is very nearly universally accepted, it has turned out to be rife with both conceptual and empirical problems. One of the main difficulties that picture theories run into is that every experimental finding cited in support of the “pictorial” aspect of mental imagery can be more easily and more naturally explained by the simple hypothesis that when asked to imagine something, people ask themselves what it would be like for them to see it, and then they simulate as many aspects of this staged event as seem relevant, as they know about, and as they are able to mimic. Thus for example, when asked to imagine a printed word we choose whether to imagine printing in lower or upper case because a real printed word would be one or the other (on the other hand since we cannot represent every property of a printed word, we most likely do not choose a particular font or type of paper).

I have looked at a very large number of experiments in some detail and have found very few (see next paragraph) that do not succumb to the explanation that when we have a visual image of something, we are simply thinking about what it would look like if we saw it. We do this not for any perverse desire to please the picture-theorists, but for many different reasons; we do it because recall of past events is better when we go through a scenario of recalling things in the order in which they occurred, because imagining things usually results in recall of similar past experiences, because when we think about how to solve a particular problem we think about the sequence of events we would go through, and perhaps even more relevant in the context of an experiment, we do it because the instruction to “imagine” something entails imaging what would happen in a real situation. But none of these reasons exposes a particular mental mechanism or a particular format; we could equally have visualized the problem differently, or not at all. In other words, it may well be that the experiments simply do not reveal anything abut the format of our thoughts. I consider this alternative explanation (or closely related ones) to be the “null hypothesis” against which one ought to compare imagery theories, because it makes no assumptions about format – all the explanation resides in the tacit knowledge that people have about how things tend to happen in the world (regardless of how this knowledge is represented), together with their use of well-known psychophysical and cognitive skills. These skills include the ability to mark and track elements in a visual field, perhaps using the visual indexing or FINST mechanism (described in [7]), to mark and recall directions in proprioceptive space, to compute time-to-collision, to generate time intervals proportional to known quantities, and to recall things better when in a situation resembling one that we were in when we saw it, and so on).

Although a few experimental findings do not fit the simulation-from-tacit-knowledge explanation, none of them fit the picture theory either. Take for example the classical “mental rotation” finding, where the time it takes to determine that two figures are identical except for orientation has been shown to be a linear function of the difference in orientation between the figures’ principal axes. This result is obtained even if observers do not attempt to “rotate” their image and even when they do not experience the phenomenology of a rotating image. But this result does not support the claim that the “image” behaves as though it were bound to a surface that is rotated in a rigid and holistic manner. It is clear that some articulated process is taking place which, at the very least, involves consulting each of the figures in an iterative process. Eye tracking records and the finding that the apparent rate of rotation depends on both the complexity of the figure and the comparison task, make it clear that the figure’s shape does not move rigidly through intermediate orientations as many had thought. There has been at least one proposal that attributes the rotation effect to the necessity of solving this problem within certain computational resource constraints [8]. Similarly the ability of mental imagery to yield perceptual-motor adaptation effects [9] appears to be due not to tacit knowledge, but to the fact that imagining one’s hand position superimposed on a visual scene provides all the required conditions for adaptation without having to assume that a pictorial object is created. These and many other cases that are frequently sited in support of the picture theory are reviewed in [5,6].

The basic problem with any appeal to inherent properties of a mental image is this: Since it is your image you can make it have very nearly any property, or exhibit any behavior you wish (and so in many circumstances you make it recreate whatever you think would happen if you were seeing some real event occur – such as the example of superposition of color filters illustrated in Box B). Consequently, in such cases nothing at all is gained by assuming that images are pictorial in form. Those who believe that various phenomena, such as the increased time it takes to scan greater distances in one’s image, or to distinguish details in a smaller image, are due to the form of the image, reply that the phenomena could not have been “faked” since people often cannot predict what such experiments will show nor they cannot articulate answers to certain questions when they do not use a mental image. But there is no question of “faking” answers or of being disingenuous – people simply do what they are supposed to do when asked to “image” something: They think about what would happen if they saw it and they use their tacit knowledge to generate some appropriate sequence of events in their mind (and thus to exhibit appropriate response times). Similarly subjects answer questions about their image by consulting their tacit knowledge and their memory of the appearance of the relevant situation. The concept of tacit knowledge is central in cognitive science where it is clear that access to such tacit knowledge depends on how the question is put and what the subject believes the task to be.

Those who believe that imaginal thinking is essentially pictorial often explicitly deny the claim that there are literally pictures in the brain. Yet the way that certain classical behavioral findings (e.g., increased time to scan greater distances or report details in small images) are explained requires such a literal picture. Something referred to as a “functional space” (such as a matrix data structure) will not do since such a space, being a fiction, can have any properties we like. That we find certain properties “natural” (e.g., that to scan a greater distance in a matrix requires that we pass through more empty cells) simply shows that the theorists tacitly assume the matrix to be a simulation of real space, since otherwise one would not need to “scan” over its cells in any particular order.

It is important to see that the literal picture is essential if the format or the medium of the image, rather than something else (e.g., tacit knowledge), is to explain the standard imagery results. For example, in order to explain why it takes longer to mentally scan a greater distance on a mental image, something about the image format or medium would have to make the following equation come out to be true: time=distance speed. This can be either a literal distance or some set of brain properties that are related by the same equation and are used to compute time. While the latter is a logical possibility, the odds against it are immense when you consider the very large number of constraints such an analog medium would have to satisfy, for example it would have to satisfy the Euclidean axioms, Pythagoras’ theorem, many physical laws governing acceleration, bouncing and recoil, and so on for all the properties that are often true of the behavior or our images (which includes momentum, see [10]).

It is much more plausible that most of these phenomena have little to do with the format of mental images, but rather with how people understand the task of examining an imagined map. For example, no picture format account can explain why the mental scanning effect disappears if subjects believe that what they are to imagine is a process that would not take longer for larger distances (or one in which something other than scanning is emphasized – such as computing relative directions from the image). While it seems obvious that this is the case (try imagining some scene and switch your attention from one place to another without moving continuously through the void in between), we have gone through the trouble of showing this in several experiments [11]. Such demonstrations illustrate what I have referred to as the cognitive penetrability of phenomena alleged to be constrained by the nature of the format or medium (i.e., it shows that the observed phenomena change in a rational way as your beliefs or goals change) which provides strong evidence that the effect was not due to a property of the image format (or the way the image is displayed in the brain).