Supplementary Materials for:
Recovery of surface pose from texture orientation statistics under perspective projection
Paul A. Warren & Pascal Mamassian
Human behavioural data
We undertook an experiment with three human observers to investigate whether any similarities could be found between the models presented and human behaviour. There is evidence that human observers commonly underestimate surface slant when only texture information is present (e.g. see Gibson, 1950; Braunstein, 1968). This finding would be consistent with the results of our simulations with the EXP but not the MAP decision rule. Consequently our first goal was to assess whether human slant settings using the types of stimuli shown to the model would result in similar patterns of perceived slant to those produced by our models.
The second goal of this behavioural study was to investigate whether human participants really are sensitive to perspective orientation information. The perspective model presented makes use of information about both the position of texels on the plane and also the distribution of texel orientations under perspective projection. Clearly it is of interest to assess whether observers are sensitive to the perspective projection of orientation information.
Methods
One observer was the first author (PAW), the other two observers were naïve to the purpose of the experiment but had participated in previous psychophysical studies. All stimuli were generated using the PsychToolbox (Pelli, 1997; Brainard, 1997) library in MatLab, and were presented using a G4 Macintosh on a 21” Sony LCD screen with a refresh rate of 75 Hz. In a single trial observers saw a stimulus at 41 cm viewing distance, viewed through a real circular aperture with a diameter of approximately 40 degrees of visual angle. The aperture was positioned 10cm from the observer’s eye. In order to minimise the effects of surrounding cues to depth the aperture covered the frame of the monitor and the room was dark. Viewing was monocular and observers placed the head in a chin and head rest.
Each stimulus consisted of a set of oriented white line segments uniformly distributed in space on a grey virtual surface which was then slanted (top end away, like a ground plane – corresponding to a tilt of 90 degrees) by an angle of 25, 45 or 65 degrees. The orientations of the lines on the surface in each stimulus followed a uniform distribution and thus satisfied the assumption of orientational isotropy. The line segments were of varying length in the image between 0.15 and 0.6 degrees of visual angle. The size of the lines was varied in order to minimise the already relatively small potential effects of the size cue to slant. Furthermore to minimize the impact of the density cue to slant, the density of texels on the surface prior to projection was set to be rather low (0.1 elements/cm2). We assumed that, in the absence of consistent size information and reduced density information, these cues would be relatively down-weighted and therefore the impact of the foreshortening cue would be maximised.
Note that recent work (Watt et al., 2005) has suggested that focus cues such as blur and accommodation can also contribute significantly to the percept of depth and consequently could impact upon perceived slant (in the present case they would be expected to reduce the slant estimate since they indicate no change in depth between the top and bottom of the display). In practice it is very difficult to control for such cues, however, we should bear the potential contribution of focus cues in mind when interpreting our data. Note that any relative comparison in slant settings between different conditions automatically controls for the effect of such cues since they were present across all conditions.
On each trial the observer saw a fixation cross at the centre of the screen. After 500ms the cross disappeared and a single surface was presented. In order to minimise the potential effects of eye movements, the surface was presented for only 150 ms (11 frames). The observer then saw a side-on, on-screen representation of the slanted plane. Using two keys, the observer could increase or decrease the slant of the plane between 0 degrees (fronto-parallel) and 90 degrees (seen edge-on) and was told to give their best estimate of the surface slant. Note that in practice our observers never complained that the surfaces had negative slant (i.e., they never saw a ceiling plane). When happy with the judgment the observer pressed a third key to initiate the next trial.
An anonymous reviewer suggested that since the mapping between perceived slant and the setting of slant using the side-on probe is unknown, the data obtained may be biased away from the true percept. Whilst we are prepared to admit this possibility and agree that this could have implications for the interpretation of our data in terms of the model, we make the assumption that these biases are small on average relative to the large underestimate of slant which has been reported in several previous tasks. Furthermore, when asked about their setting of the probe the observers indicated that they simply perceived the slant and then converted it into a number which was represented using the side-on probe. It is assumed that this strategy would not lead to large biases away from perceived slant.
For each stimulus, the surface was projected onto the screen using one of two projection conditions. One projection condition used a consistent process to generate the location and orientation of the line segments. This was the perspective location, perspective orientation (Lp, Op) condition. The second was a mixed condition in which the location of the line segment was generated under perspective projection while the orientation used orthographic projection (Lp, Oo). This manipulation was undertaken to try to demonstrate that human observers are sensitive to and can use the information present in the perspective projection of orientation. For both conditions, care was taken not to provide any horizon cue to slant, i.e., even in the highest slant condition (65 degrees), the 40 degree aperture was completely filled with texture elements. In order to do this, a plane much larger than the screen (±162 cm) was defined by its texture elements and only lines falling within the on-screen region bounded by the aperture were then projected in the image. If human observers can use the information arising due to perspective projection of orientation information then we would expect better slant estimation in the (Lp, Op) condition.
We used a fully blocked design so that when the three surface slants were combined with the two projection methods there were a total of 6 different stimulus conditions. These conditions were randomly interleaved and in total all three observers saw each condition 20 times over a period of a few days.
Results
The results are shown in figure S1 for the “composite observer” obtained by averaging over the three participants. The averaging process was justified since all three participants produced surprisingly similar data. As has been shown previously (e.g. Gruber and Clark, 1956, Braunstein, 1968), observers show a marked underestimation of the simulated slant. This is particularly the case at intermediate slants, but as the simulated slant increased, observers’ performance was more veridical.
Figure S1: Results of experiment with human participants. Data are averaged over 3 participants. Error bars represent the average standard error (over the three observers). The open circles correspond to the (Lp, Op) condition in which texel location and orientation was determined by perspective orientation. The open squares correspond to the (Lp, Oo) condition in which texel location is determined by perspective projection but texel orientation is determined by an orthographic projection process.
Note that there is a clear difference in slant estimation between the two projection conditions. In particular, it appears that when orientations are determined by perspective projection the observer perceives more slant at intermediate simulated slants.
This interpretation was tested in a 2 factor (projection condition × world slant) repeated measures ANOVA. We found significant main effects of slant (F(2, 4) = 77.28, p = 0.001) and projection condition (F(1, 2) = 22.35, p = 0.042) and no interaction between these factors (F(2, 4) = 18.62, p = 0.127). These results provide some evidence that human observers are sensitive to orientation information obtained under perspective projection and experience greater perceived slant in this condition (particularly at intermediate slants).
Figure S2: Results of simulations in which the new perspective projection method model views the (Lp, Op) stimulus seen by human observers. The closed circles represent the perceived slant seen by human observers. The open circles represent the recovered slants for the model for a range of sampling efficiency parameters. Error bars represent the standard error of the mean.
The perspective projection method model with the expected gain decision rule presented in the main text shows some similarity to the data presented here. The model shows a marked underestimation of slant in certain conditions (e.g. figure 7c). Figure S2 shows the outcome when this model views the texel information seen by our observers in the (Lp, Op) condition.
We have also simulated the effect of sampling inefficiency. Due to the rather short presentation time in our experiment (150ms), it is possible that our observers were unable to use all the texel information in this time. The effect of reducing the proportion of texels used (selected at random from those in the stimulus set) in this simulated experiment is also shown in figure S2 (curves marked 0.05 – 0.5). Note that model performance is considerably more veridical than human performance when the model uses all the information present in the scene (the curve marked 1.0 in figure S2). Note that when the model uses around 10% of the texels (randomly chosen) in the image performance is close to that of our human observers.
References
Brainard, D. H. (1997). The Psychophysics Toolbox. Spatial Vision, 10, 443-446.
Braunstein, M. L. (1968). Motion and texture as sources of slant information Journal of Experimental Psychology, 78, 247-253.
Gibson, J. J. (1950) The Perception of the Visual World. Boston: Houghton Mifflin.
Gruber, H. E. & Clark, W. C. (1956) Perception of slanted surfaces, Perceptual and Motor Skills, 6, 97 – 106.
Pelli, D.G. (1997) The VideoToolbox software for visual psychophysics:
Transforming numbers into movies. Spatial Vision, 10, 437-442.
Watt, S. J., Akeley, K. Ernst, M. O. & Banks, M. S. (2005). Focus cues affect perceived depth. Journal of Vision, 5, 834-862.