1
Probing the mysterious underpinnings of multi-voxel fMRI analyses
Hans P. Op de Beeck*
* Laboratory of Biological Psychology, University of Leuven (K.U.Leuven), Belgium
NeuroImage, in press
Correspondence:
Hans Op de Beeck, Ph.D.
University of Leuven
Laboratory of Biological Psychology
Tiensestraat 102, B-3000 Leuven
Belgium
Tel: +32-16-32.59.30
Fax: +32-16-32.60.99
E-mail:
ABSTRACT
Various arguments have been proposed for or against sub-voxel sensitivity or hyperacuity in functional magnetic resonance imaging (fMRI) at standard resolution. Sub-voxel sensitivity might exist, but nevertheless the performance of multi-voxel fMRI analyses is very likely to be dominated by a larger-scale organization, even if this organization is very weak. Up to now, most arguments are indirect in nature: they do not in themselves proof or contradict sub-voxel sensitivity, but they are suggestive, seem consistent or not with sub-voxel sensitivity, or show that the principle might or might not work. Here the previously proposed smoothing argument against hyperacuity is extended with simulations that include more realistic signal, noise, and analysis properties than any of the simulations presented before. These simulations confirm the relevance of the smoothing approach to find out the scale of the functional maps that underlie the outcome of multi-voxel analyses, at least in relative terms (differences in the scale of different maps). However, image smoothing, like most other arguments in the literature, is an indirect argument, and at the end of the day such arguments are not sufficient to decide the issue on whether and how much sub-voxel maps contribute.A few suggestions are made about the type of evidence that is neededto help us understand the as yet mysterious underpinnings of multi-voxel fMRI analyses.
Hyperacuity or sub-voxel sensitivity refers to the possibility of reliably detecting a functional map that is organized at a scale that is notably smaller than the voxel size in an fMRI dataset.Such small-scale maps exist, and they have been measured directly in humans with spin-echo fMRI at high field strength (Yacoub et al., 2008), methods that provide an exceptional signal to noise ratio (allowing scanning at higher resolution) and signals that reflect smaller vascular components to a much higher degree than regular fMRI scans (Duong et al., 2003). With standard methods, single voxels do not provide reliable information about such maps, but the information from multiple voxels together might be more useful. Indeed, sub-voxel sensitivity may underlie many recent findings where multi-voxel pattern analyses techniques were able to reveal reliable patterns in cases where univariate voxel-by-voxel analyses failed (Downing et al., 2007; Haynes and Rees, 2006; Kamitani and Tong, 2005; Op de Beeck et al., 2008). Nevertheless, there is an alternative explanation possible that is very hard to disproof, namely the existence of a weak large-scale organization.
Indirect arguments in favour of and against hyperacuity
If we assume that BOLD signal is reflecting neural activities, an assumption that is not under discussion here, then it is trivial to say that a voxel's signal originates from sub-voxel activities. For example, the face preference of voxels in the fusiform face area reflects the sensitivity of neurons within this region (and neurons are sub-voxel). However, the strong preference in the fMRI signal is due to a large scale of organization: the neuronal selectivity in this area is quite uniform at a scale larger than a voxel size.The question at hand is the degree to which multi-voxel pattern analyses pick up neuronal selectivity that is organized at a scale smaller than a voxel size.
Many studies applying multi-voxel pattern analyses have remained agnostic about what type and scale of organization might underlie the reliable patterns. The case in favour of sub-voxel sensitivity has been made most stronglyfor the decoding of grating orientation based on the pattern of activity in primary visual cortex(Haynes and Rees, 2006; Kamitani and Tong, 2005). The main arguments are as follows. First, a map of the orientation preferences across voxels in unsmoothed data reveals a scattered pattern that showed little clustering across voxels and that was idiosyncratic across subjects. Second, the only large-scale supra-voxel organization of orientation selectivity that has been consistently reported in the literature did not appear to underlie the bulk of the orientation selectivity. Third, simulations with small orientation columns sub-sampled at voxel resolution showed that a sub-voxel organization can indeed be picked up by larger voxels if the orientation columns are distributed randomly across the cortical surface. Finally, a fourth argument has been put forward recently(Kriegeskorte et al., 2009), based on the very ingenious but as yet speculative hypothesis that the contribution of high-frequency signals might be unexpectedly large because fMRI voxels sample neural activity in an inhomogeneous manner due to the properties of the neuro-vascular coupling. Some of these arguments have already been discussed by Op de Beeck (2009), and further comments are given in the later section A critical assessment of indirect arguments.
It is clear that these arguments are circumstantial rather than conclusive: None of them directly shows sub-voxel selectivity, the arguments just make it a reasonable hypothesis. Op de Beeck (2009) added another circumstantial argument, this time an argument against sub-voxel selectivity. This argument was based on the effect of image smoothing on the outcome of multi-voxel analyses. It was suggested that this effect can be different for a small-scale (sub-voxel) and large-scale organization. Empirical data about orientation selectivity in V1 and novel object selectivity in visual object-selective cortex seemed most consistent with the presence of a large-scale organization.
Smoothing affects the signal to noise ratio, not the absolute amount of information
Kamitani and Sawahata (2009) criticized the relevance of this smoothing approach for deciding between a small-scale and a large-scale organization based on two comments. A first comment started from the fact that smoothing does not decrease the absolute amount of information in a given dataset, at least not when the unavoidable spatial uncertainty caused by smoothing does not interfere with any step in the analysis (e.g., defining precise regions of interest; no smoothing of voxels within and across ROI borders). This is a good thing to keep in mind, however, the outcome of statistical analyses is strongly dependent on the relative amount of information compared to the amount of noise in a dataset – the signal to noise ratio.
With smoothing and other types of filtering we can manipulate this relative strength of the to-be-detected signal, at least if the signal and the noise differ in their power spectrum. And this is exactly why smoothing is relevant for deciding between a small-scale and a large-scale organization. A small-scale organization would be expected to benefit from enhancing the higher spatial frequencies in the signal compared to the lower spatial frequencies (at least if the noise is not too strong in these lower frequency bands), while the opposite is true for a large-scale organization. So it is not the absolute amount of information, which is indeed preserved by smoothing if the smoothing kernel is invertible, that matters for the argument, but the relative amount of information compared to the amount of noise. This verbal argument is confirmed by the simulation described in Fig. 1 where a smoothing kernel is applied that is invertible but nevertheless strongly affects the outcome of multivariate analyses.
The effect of smoothing under conditions of subject motion and spatially correlated noise
A second criticism to the initial predictions formulated by Op de Beeck (2009)is that these predictions change when one considers the effect of subject motion and consequential motion correction(Kamitani and Sawahata, 2009), keeping all other aspects of the reported and simplistic simulations of Op de Beeck (2009) unaltered.Including subject motion, Kamitani and Sawahata find qualitatively similar effects of smoothing for both a small-scale and large-scale organization. Note, however, that the effects were quantitatively very different: The beneficial effect of smoothing on a multi-voxel correlational analysis was twice as strong for the large-scale compared to the small-scale organization. Thus, even when taking these new simulations at face value, they do not show that the smoothing approach is irrelevant. Different scales of organization were differentially affected by smoothing in all simulations! Nevertheless, this discussion illustrates that the absolute effects of smoothing(without a benchmark dataset) can be difficult to interpret. This is a general problem with circumstantial evidence, as I will discuss later on for the arguments in favour of sub-voxel sensitivity.
However, the most important concern about this second comment is that the effect of subject motion and its correction is washed away by another factor that was discussed by Op de Beeck (2009): the power spectrum of the noise in fMRI data. Up to now, all simulations included white noise, thus assuming that the noise is independent for neighbouring voxels. This is unrealistic, and real fMRI data reveal clear spatial correlations in the unsystematic variations between neighbouring pixels. Part of these correlations is due to subject motion and consequential motion correction, but another part might be due to the smoothed nature of the physiological phenomena underlying the fMRI signal. Spatially spread physiological effects (Woolrich et al., 2004) at a fairly large scale might very well contribute to the noise.
What happens when we include very moderate spatially correlated noise in the simulations and look for the effect of smoothing? As was already mentioned shortly before (Op de Beeck, 2009), with this type of noisethe empirically observed increase of spatial correlations as a function of smoothing can only be found with a very large-scale functional organization. If the noise spectrum has a spatial correlation matrix corresponding to a kernel of 4 mm full-width-at-half-of-maximum, then the small-scale organization as used in the original simulations shows decreased spatial correlations with higher degrees of smoothing, and eventhe original large-scale organization shows no clear beneficial effect of smoothing. These results are shown in Figure 2.It takes much larger scales of organization to see a clear beneficial effect of smoothing as found empirically.Note that these newest simulations include subject motion. Thus, not only does the inclusion of spatially correlated noise wash away the effect of subject motion, it even makes the situation worse if one tries to make an argument in favour of sub-voxel sensitivity: The spatial scale needed to simulate the empirical effects is even larger than the scale of the large-scale organization used before by Op de Beeck (2009).This finding is easy to understand from the effect of smoothing on the Fourier amplitude spectrum of the data. Spatially correlated noise has more power in the lower spatial frequencies that also contain most of the signal in the large-scale organization used by Op de Beeck (2009).
Finally, it is important to understand what really causes the beneficial effect of smoothing on the decoding of a small-scale organization when subject motion is present and noise is not correlated (the observation of Kamitani & Sawahata, 2009). This is not because the performance goes up due to smoothing, but because performance decreases drastically in the unsmoothed data (compared to an unsmoothed analysis of data without subject motion as reported by Op de Beeck, 2009). The importance of this observation can be illustrated by the example maps shown in Figure 4 in Op de Beeck (2009). For the maps in that figure, MVPA performance on unsmoothed data was equal for a notably weak large-scale organization and a strong small-scale organization. Given that MVPA performance for the small-scale organization is affected disproportionally by subject motion, then an even weaker large-scale organization would result in comparable correlations. This makes it even more relevant to uphold the hypothesis that all results might be caused by a large-scale organization. This observation also questions the consistency of the hyperacuity hypothesis with the finding of Kamitani and Tong (2005) that orientation decoding performance survives a large position shift:decoding performance for sessions separated by 31 and 40 days was almost as good as within session decoding.
A critical assessment of indirect arguments
I do not want to imply that the smoothing approach will provide the definite answer to all our questions, butat least the smoothing approach can be relevant in our quest to find out about the scale of organization that underlies a positive result in multi-voxel pattern analyses.Definitely in relative terms, for example to find out whether one feature is organized at a finer/larger scale compared to another feature.From this assessment I would say that the smoothing approach is at least as relevant as all the indirect arguments that have been put forward in favour of sub-voxel sensitivity, given that these arguments face difficulties as well. One argument, the visual inspection of unsmoothed and noisy activation maps, is less informative than one would think intuitively (for a demonstration, see Op de Beeck, 2009). Another argument was based on simulations that included effects of subject motion but not the potentially stronger effects of spatially correlated noise. So if the smoothing approach suffers from interpretational problems due to the indirect nature of the evidence and the simplicity of the simulations, then this feasibility argument suffers from the same interpretational problems.
A third argument in favour of sub-voxel sensitivity can be summarized as saying that we might want to uphold the hyperacuity hypothesis simply because we cannot directly come up with a large-scale organization that would explain the results. Needless to say that by itself this is not a convincing scientific argument.Especially because even a very weak large-scale organization would be enough to strongly dominate the performance in multi-voxel analyses, and many relatively strong large-scale maps are known to exist. In the case of orientation selectivity, a global bias exists for radial orientations, which is a relationship between preferred orientation and retinotopic location. Previous studies are not in agreement about how strong it is and how much it might contribute to the decoding of grating orientation (Kamitani and Tong, 2005; Sasaki et al., 2006). In the case of selectivity for novel objects, the other illustration used by Op de Beeck (2009), we know from studies in monkeys that the relevant anatomicalregions contain relatively small feature columns (diameter around 0.5 mm) as well as larger functional domains such as face-selective patches(Tsao et al., 2006). So the selectivity for novel objects as measured with fMRI might reflect a weak selectivity of these larger functional regions as well as, or even more than, a strong selectivity of small feature columns.
The road ahead
The message here is not that the smoothing argument is better than any of these other arguments. Instead, the message is that all arguments are equally problematic. This is due to their indirect nature. We only have circumstantial evidence for or against hyperacuity. What sort of evidence do we need? Ideally, one would have a direct measure of selectivity, e.g. for orientation, by scanning at high resolution using techniques such as high field scanning to emphasize the small components of the physiological underpinnings of the fMRI signal(Yacoub et al., 2008). In the best of all worlds we would have a validation of these measurements in an animal model so that we can verify that the measured columns correspond to a real neural architecture (see e.g. (Fukuda et al., 2006). Once these validations and high-field measurements are in place, the next step is to co-register the high-field fMRI data with standard low-field fMRI data to find out to what degree the low-field fMRI data indeed reflect the low-resolution sub-sampling of a small-scale functional organization. This is not a trivial task given the different geometric distortions that might occur at different field strengths and with different MRI sequences. Nevertheless, in order to get conclusive evidence in favour of hyperacuity we need this whole chain of information all the way from a direct and validated measurement of the small-scale units/columns up to a co-registered standard low-resolution fMRI dataset.
Relevant evidence was already obtained in the recent work of Shmuel and colleagues(Shmuel et al., 2009). Most of this work concerns a feature that is organized at a somewhat larger scale compared to orientation selectivity: ocular dominance(Yacoub et al., 2008). The data suggested the presence of information carried by both a fine-scale and a coarse-scale organization, with a link between the coarse-scale structures and macroscopic blood vessels. These data were obtained at 7 T magnetic field strength, and even larger contributions from coarse-scale organizations are expected at 3 T. These data are consistent with the statement that the presence of a large-scale, supra-voxel organization in neural maps and/or vasculature is a strong contender for the hypothesis of hyperacuity.