Abstract
Most people see movement in Figure 1, although the image is static. Motion is seen from black → blue → white → yellow → black. Many hypotheses for the illusory motion have been proposed, although none have been tested physiologically. We found that the illusion works well even if it is achromatic: yellow is replaced with light gray, and blue is replaced with dark gray. We show that the critical feature for inducing illusory motion is the luminance relationship of the static elements. Illusory motion is seen from black → dark gray → white → light gray → black. In psychophysical experiments, we found that all four pairs of adjacent elements when presented alone each produced illusory motion consistent with the original illusion, a result not expected from any current models. We also show that direction-selective neurons in macaque visual cortex gave directional responses to the same static element pairs, also in a direction consistent with the illusory motion. This is the first demonstration of directional responses by single neurons to static displays and supports a model in which low-level, first-order motion detectors interpret contrast-dependent differences in response timing as motion. We demonstrate that this illusion is a static version of four-stroke apparent motion.
Introduction
Each wheel in Figure 1 is composed of a repeating series of elements that produces a transient perception of motion with each eye movement or blink. The perceived direction is black → blue → white → yellow for the colored version or black → dark gray → white → light gray for the grayscale version. The illusion produces a strong sensation of motion if fixation is maintained and the illusion is moved or flashed on and off (supplemental video 1, available at www.jneurosci.org as supplemental material), which shows that simply refreshing retinal stimulation is sufficient to elicit the illusion. The illusion is a modification of the peripheral drift illusion, a saw-tooth luminance profile that induces a weak motion illusion along the black-to-white gradient (Fraser and Wilcox, 1979; Faubert and Herbert, 1999).
Kitaoka and Ashida (2003) proposed that the illusory motion in Figure 1 depends on the fact that black and white are higher contrast than dark gray and light gray (compared with the average gray of the entire display) and so produce faster responses in the visual system. Indeed contrast-based differences in response timing of visual neurons exist (Shapley and Victor, 1978; Sestokas and Lehmkuhle, 1986; Maunsell and Gibson, 1992). Thus a possible explanation would be that motion detectors, at some un-specified location in the brain, that span the contrast jumps between white/light gray or black/dark gray are activated on the higher-contrast side of each pair before being activated on the lower-contrast side and thus respond as if there were real motion in the image.
Such contrast-dependent response timing differences could explain the illusory motion signal elicited by the white/light gray or black/dark gray pairs, but motion signals arising from contrast-dependent latency differences for the other adjacent-element pairs, dark gray/white or light gray/black, which are just as abundant in the illusion, should be in the opposite direction. Therefore, a contrast-dependent latency-difference model (Kitaoka and Ashida, 2003) would require that these element pairs generate weaker signals, although there is no reason to expect any difference in response magnitude from different adjacent-element pairs.
Here we tested an alternative explanation. Although we agree that the white/light gray and black/dark gray pairs should generate motion signals from the higher-contrast element toward the lower, consistent with the illusion, because motion detectors are sensitive to the sign of contrast (Emerson et al., 1987; Conway and Livingstone, 2003; Livingstone and Conway, 2003), we suggest that the dark gray/white and light gray/black pairs might also generate motion signals that contribute to the illusion, analogous to “reverse-phi” [apparent motion spots that invert contrast appear to move in the opposite direction to the physical progression of the spots (Anstis, 1970)]. “Forward-phi” pairs comprise elements with the same sign of contrast, whereas reverse-phi pairs are opposite in contrast, relative to the average gray of the entire tiled pattern. The consistent forward-phi and reverse-phi signals could therefore be thought of as a static version of four-stroke apparent motion (Anstis and Rogers, 1986; Mather and Murdoch, 1999).
Because the ability of contrast-dependent latency differences to evoke motion signals has not been tested previously, either psychophysically or physiologically, the goal of this project was to ask whether pairs of stimuli of different contrasts could generate motion signals, both psychophysically and physiologically, and to ask which element pairs of the illusion in Figure 1 could be responsible for the powerful illusory motion.
Materials and Methods
Stimuli. Stimuli for both psychophysical and physiological experiments were presented on 21-inch monitors with a 75 Hz refresh rate (non-interlaced). The colors of the elements in the first set of psychophysical experiments were the same as the colors in the web-based versions of a similar illusion published previously by A. Kitaoka (http://www.ritsumei.ac.jp/~akitaoka/rotsnake.gif). The luminances of the elements in the grayscale version were chosen to match those of the illusion: white was 70 cd/m2; light gray was 40 cd/m2; dark gray was 30 cd/m2; black was <1 cd/m2; and the background gray was the average luminance, or 35 cd/m2.
Psychophysics. For the psychophysical experiments, the stimulus consisted of four frames of a strip of 16 of each element pairs, at 5 Hz, against the intermediate gray background. Each frame consisted of a strip of 16 of the given element pair; each pair within the strip was separated by a space of average gray the width of the element pair. The element pairs of sequential frames were arranged so that the gray spacers in one frame were replaced by element pairs in the next frame (Fig. 2 A) (supplemental video 2, available at www.jneurosci.org as supplemental material). Sixteen stimuli were used, eight colored and eight grayscale. The element pairs predicting rightward motion were blue/white (shown in Fig. 2 A), white/yellow, yellow/black, black/blue; those predicting leftward motion were white/blue, yellow/white, black/yellow, and blue/black. For the grayscale stimuli, blue was replaced with dark gray and yellow with light gray.
Subjects were asked to fixate a small spot 2° above the row of stimuli and to report which direction each trial appeared to move. For each trial, which was self-initiated, subjects indicated by a button press whether they thought the strip of elements had moved to the right or to the left. The monitor was viewed at a distance of 50 cm. The stimuli were generated and displayed with the Psychophysics toolbox (Psychtoolbox Win 2.50, release 3), installed in Matlab 6.5 (MathWorks, Natick, MA). Ten subjects were tested for the color experiment and 10 for the grayscale experiment. All of the subjects were naive as to the goals of the experiment. None of the authors of this paper served as subjects. Six of the subjects participated in both experiments, and the symbols in Figure 2 indicating those subjects are ×, +, stars, and the three triangles.
Single-unit physiology. For the physiological experiments, alert macaque monkeys were prepared for chronic recording as described previously (Conway, 2001; Livingstone et al., 2001). All experiments were performed according to National Institutes of Health guidelines for the use of animals and with the approval of the Harvard Medical School Standing Committee on the use of animals. Eye position was monitored with a search coil in a magnetic field (Judge et al., 1980); the monitors were from DNI Inc. and CNC Engineering (Enfield, CT). Well-isolated single units were recorded using tungsten microelectrodes (Hubel, 1957) (Frederick Haer Company, Bowdoinham, ME) from three alert fixating macaque monkeys. Spikes were collected at 1 ms resolution; eye position was sampled at 250 Hz. The monitor screen was 100 cm in front of the monkey. The monkey was rewarded for keeping his gaze within 1° of a fixation spot, and spikes were rejected from analysis if they were collected while the monkey's gaze was not within 1° of the fixation spot.
Neurons were first screened for directionality using moving bars. The responses to each direction of motion, minus baseline firing, were used to calculate direction indices (D.I.s) as follows: (Rp - Rn)/(Rp + Rn), where Rp was the average response to the preferred direction of motion and Rn was the average response to the null direction. The direction index for moving bars can range from 0, for a cell that gives equal responses to the two directions, to 1, for a cell that responds only to a single direction, which is by definition the preferred direction; the direction index can be >1 for cells that show null-direction suppression.
Each cell was then tested with flashed pairs of adjacent bars at the optimal orientation of the cell, against an intermediate gray background. The pairs were white and light gray, light gray and black, black and dark gray, and dark gray and white. Thus, the bar pairs were the same as the grayscale elements in the psychophysical experiment. The bar pairs were presented for 50 ms ON and 100 ms OFF, at random positions along a stimulus range, centered on the receptive field of the cell.
Each bar pair could appear in a congruent or an anti-congruent configuration, defined by the actual direction preference of the cell and the direction in the illusion for that particular element pair. For example, for a rightward-preferring cell, the congruent configuration for the white/light gray pair would be with the light gray bar to the right of the white bar, and the anti-congruent configuration would be with the light gray bar to the left of the white bar. Congruent and anti-congruent configurations of bar pairs were randomly interleaved. The congruency index (C.I.) for each cell for each element pair was as follows: (Rc - Rac)/(Rc + Rac), where Rc was the response to the congruent configuration and Rac was the response to the anti-congruent configuration. Responses were calculated as the total spikes over the entire response, minus baseline firing. Histograms of responses to congruent minus responses to anti-congruent stimuli, not normalized, are shown in supplemental Figure 1 (available at www.jneurosci.org as supplemental material).
Thirty-nine direction-selective primary visual cortex (V1) cells were tested with flashed pairs of bars, and this population was divided into cells with low D.I.s to moving bars (D.I. < 0.3) and cells with high D.I.s (D.I. > 0.3). The middle temporal area (MT) was identified by magnetic resonance imaging before recording and during recording by the electrode depth, prevalence of directionally selective visual responses, receptive field size, and visual topography (Van Essen et al., 1981; Desimone and Ungerleider, 1986). Twenty cells were recorded in MT; all MT cells had a D.I. > 0.9.
Results
Psychophysics
We sought to explore the basis for the motion illusion first psychophysically, presenting each of the four adjacent-element pairs in the illusion independently: black/blue, blue/white, white/yellow, and yellow/black. We also tested the motion percept to the mirror image of each element pair: blue/black, white/blue, yellow/white, and black/yellow. A trial consisted of four frames of a given element pair. Each frame consisted of a strip of 16 of the given element pair; each pair within the strip was separated by a space of average gray the width of the element pair. The element pairs of sequential frames were arranged so that the gray spacers in one frame were replaced by element pairs in the next frame (Fig. 2A) (supplemental video 2, available at www.jneurosci.org as supplemental material). We presented 50 trials of each element pair and 50 of its mirror image randomly interleaved, for a total of 400 trials per subject. Subjects were asked to indicate whether the strip of rectangles appeared to move to the right or to the left, although there is no actual motion energy in the stimulus. Responses were categorized as consistent with the illusion if the perceived motion was in the same direction as the illusory motion of that element pair in Figure 1. For example, in rightward-moving parts of the illusion, the blue and the white elements are oriented with the blue element on the left of the white element, whereas in leftward-moving parts of the illusion, the blue element is on the right of the white element.
We tried flashing single strips of 16 identical element pairs and did not observe any consistent motion signal. Therefore, the motion signal from a single (static or flashed) presentation is too weak to be observable. The sequence of frames that we used (e.g., supplementary video 2, available at www.jneurosci.org as supplemental material) consists of a strong but directionally ambiguous motion stimulus that we assume enhances any weak signal from each element pair.
Despite the fact that there was no net motion in any trial, subjects usually reported that there was (Fig. 2C); the average bias in the reported direction for each element pair indicates the contribution of that pair to the motion illusion. If the motion percept were produced simply in contrast-dependent differences in the latencies of response to the two elements, one would expect a motion signal in the consistent direction for the black/blue pair and the white/yellow pair (for which observed illusory motion is in the direction from the higher contrast element to the lower-contrast element) but the reverse for the blue/white and the yellow/black pair (for which the observed motion is from the lower to the higher-contrast element). This is not what we found (Fig. 2C). Subjects reported seeing motion consistent with the illusory motion direction for all four element pairs.
We repeated the experiment using grayscale versions of the same stimuli, in which blue was replaced by dark gray and yellow by light gray (Fig. 2B). All subjects still tended to see motion in the direction consistent with luminance order of the elements in Figure 1, for all four element pairs (Fig. 2D). A two-way ANOVA revealed a significant main effect (p < 0.0001) of contrast polarity (same vs different) but no effect of color (p > 0.3) and no interaction between color and polarity (p > 0.6). These results, summarized in Figure 2E, show first, that the critical component of the illusion is the luminance relationship of the elements and the background and not their color, and second, that two of the adjacent-element pairs (white/yellow and black/blue, or white/light gray and black/dark gray) generate illusory motion in the direction predicted from the assumption that motion signals arise from a contrast-dependent latency difference, but the other two element pairs (yellow/black and blue/white, or light gray/black and dark gray/white) generate illusory motion in the direction opposite to contrast-dependent latency differences. That is, for half of the element pairs, the illusory motion is perceived in the direction from the element with the higher contrast with the background toward the element with the lower contrast, but for the other half of the element pairs the illusory motion is perceived from the lower-contrast element toward the higher. We suggest that the critical difference between these two sets of element pairs is that for the white/yellow (white/light gray) pair and the black/blue (black/dark gray) pair, the two elements are both lighter or both darker than the background, whereas for the yellow/black (light gray/black) pair and the blue/white (dark gray/white) pairs, one element is lighter than the background and the other is darker; the former element pairs are the same sign of contrast, and the latter element pairs are opposite in sign of contrast relative to the background.
Physiology
All models of static motion illusions invoke behavior of direction-selective cells (Fraser and Wilcox, 1979; Faubert and Herbert, 1999; Kitaoka and Ashida, 2003), yet this has never been tested: the responses of direction-selective cells to static motion illusions have never been measured. To investigate whether the illusory motion in Figure 1 could be explained by the activity of direction-selective cells, we recorded from 39 directional single units in V1 and 20 units in MT of three alert, fixating macaque monkeys. We asked three questions. (1) Are there contrast-dependent latency differences that could explain the effects? (2) Do the element pairs generate signals in direction-selective cells in macaque V1 and MT? (3) Are the directions of the responses consistent with the illusion, and if so, for which element pairs?
Figure 3 shows that there were contrast-dependent latency differences in the responses of direction-selective cells in V1 and MT. The traces show the average responses to each of the four luminance values chosen to match the elements of the illusion in Figure 1, for V1 (Fig. 3A) and MT (Fig. 3B). In directional cells in both V1 and MT, the white bar and the black bar generated responses whose peaks were faster by 10-20 ms than the peak responses to the light gray bar and the dark gray bar, confirming previous results (Shapley and Victor, 1978; Sestokas and Lehmkuhle, 1986; Maunsell and Gibson, 1992). An apparent motion stimulus consisting of two stimuli presented 13 ms apart to adjacent parts of the receptive field of a direction-selective cell invariably generates directional responses in both V1 and MT of alert macaques (Livingstone et al., 2001; Conway and Livingstone, 2003), so we reasoned that such timing differences between the different elements used here could be sufficient to evoke directional responses. We sought to test this assumption.
We recorded from V1 and MT neurons to ask whether static presentations of the element pairs could actually generate directional responses in direction-selective neurons in the brain. A direction preference for each neuron was first measured with moving bars. Then we measured the responses of the cell to the two configurations of each element pair (aligned with the motion axis of the cell); we defined the congruent configuration as the one in Figure 1 that was consistent with the direction preference of the cell. For example, for a cell that preferred rightward motion, the congruent configuration for the white/light gray pair would be white on the left and light gray on the right because this is the configuration of the element pair that produces rightward illusory motion (Figs. 1, 2). The anti-congruent configuration for a rightward-preferring neuron would be light gray on the left and white on the right. Thus, for rightward-preferring cells, for element pairs with the same sign of contrast with the background (white/light gray and black/dark gray), the congruent configuration would be with the higher-contrast element on the left; for the element pairs of opposite sign of contrast with the background, the congruent configuration would be with the lower-contrast element on the left. Congruent and anti-congruent configurations were randomly interleaved. We compared average responses to congruent and anti-congruent stimulus configurations for all four luminance element pairs for each neuron to calculate a C.I. (see Materials and Methods). C.I.s were defined as positive if the response to the congruent configuration was larger than the response to the anti-congruent configuration, and negative for the reverse.
Histograms of C.I.s for same-contrast pairs and opposite-contrast pairs are shown in Figure 4. We subdivided the direction-selective V1 population into those that were strongly directional to moving bars [high D.I. cells (Fig. 4A)] and those that were less directional to moving bars [low D.I. cells (Fig. 4A)]. Low D.I. cells (D.I. < 0.3) in V1 did not show any significant difference in their responses to the different configurations of the flashed element pairs (t test, p = 0.4 for the same-contrast pairs; p = 0.5 for the opposite-contrast pairs; p = 0.45 for all four element pairs combined). However, V1 cells with high D.I.s (Fig. 4A, black bars) had population C.I.s that were significantly greater, and positive, than 0, for both the same-sign-of-contrast pairs (one-tailed t test, p = 0.005), the opposite-sign-of-contrast pairs (one-tailed t test, p = 0.014), and for all four contrast pairs combined (one-tailed t test, p = 0.0007). The fact that the C.I.s were on average greater than 0 means that the directionality of the responses was consistent with both the illusion and with the psychophysical experiments (Fig. 2D), both for the same-sign-of-contrast pairs and for the opposite-sign-of-contrast pairs. The histograms in Figure 4B show that MT cells gave similar results: the same-sign-of-contrast pairs and the opposite-sign-of-contrast pairs produced directional responses consistent with the illusion and with the psychophysical experiments. The population average C.I. for the same-contrast pairs, the population average for the opposite-contrast pairs, and the population average for all four contrast pairs combined were all significantly greater, and positive, than 0 (i.e., in a direction consistent with the illusion; one-tailed t test, p = 0.0004, p = 0.00008, and p = 0.000006, respectively).
The congruency indices shown in Figure 4 are normalized to average activity and are therefore comparable with direction indices. A histogram of the average raw difference in number of spikes for the congruent minus the anti-congruent responses for V1 and MT is shown in supplemental Figure 1 (available at www.jneurosci.org as supplemental material), for all four bar pairs averaged.
In both V1 and MT, strongly directional cells responded to static stimuli as if those stimuli contained a motion signal, and that motion signal was in the same direction as the illusory motion observed in Figure 1. By comparing the medians of each histogram in Figure 4, we can see that for cells in both V1 and MT, the responses to the same-sign-of-contrast element pairs were slightly more directional than the responses to the opposite-sign-of-contrast pairs, and that the responses of MT cells were more directional than the responses of V1 cells.
Discussion
Both the psychophysical results and the physiological results indicate that motion signals in the illusion, Rotating Snakes (Fig. 1), arise in the direction black → dark gray, white → light gray, dark gray → white, and light gray → black. Visual neurons respond faster to the higher-contrast white and black elements than to the lower-contrast light gray and dark gray elements (Shapley and Victor, 1978; Sestokas and Lehmkuhle, 1986; Maunsell and Gibson, 1992) (Fig. 3). Therefore the motion signals generated by the black → dark gray and the white → light gray pair are in the direction from the faster response to the slower response, which makes sense because such contrast-dependent timing differences would mimic the sequence of a stimulus that moved from the position of the higher-contrast element to that of the lower.
However the motion signals generated by the dark gray → white and the light gray → black pairs are in the direction from the slower response to the faster response, which is paradoxical. This paradox may be resolved by considering the fact that dark gray and white are opposite in sign of contrast from the average gray, as are light gray and black. Element pairs that produce motion signals in a direction consistent with their timing differences have the same sign of contrast compared with the average gray, and element pairs that generate motion signals opposite to their timing differences are opposite in sign of contrast. The pattern of responses to these static motion stimuli is analogous to the phenomenon of reverse-phi motion, which is that apparent motion stimulus pairs that invert contrast appear to move in the direction opposite to their physical motion (Anstis, 1970; Anstis and Rogers, 1975).
We have shown previously that both complex direction-selective neurons in V1 and neurons in MT respond better to apparent-motion sequences that flash along the null direction if the sequences invert contrast (an opposite direction preference to drifting bars) or flashed stimuli of constant contrast; in other words, these neurons show reverse-phi to temporal sequences (Livingstone et al., 2001; Livingstone and Conway, 2003). Here we show that these neurons also show reverse-phi to static pairs of stimuli that are spatially offset, in which the timing asynchrony is introduced by differences in contrast between the stimuli and the average gray. For both the psychophysical experiments and the physiological experiments, the forward-phi (same-sign-of-contrast) element pairs showed slightly stronger directionality than the reverse-phi (opposite-sign-of-contrast) pairs.
The congruency indices, which measure the contribution of each cell to the illusion, were smaller than the direction indices for both V1 and MT; e.g., all MT cells had a D.I. > 0.9, whereas the median congruency index in MT was 0.09. This means that for moving stimuli, on average, the response of MT cells to preferred motion was more than 10 times the response to null-direction motion, but for static stimuli, the responses to the congruent configurations were on average only 20% larger than responses to the anti-congruent configurations. We do not believe that the congruency indices are too small to account for the illusory motion. We showed that direction-selective cells respond more to one static configuration than to its mirror image, and that this response bias consistently corresponds to the actual direction preference of each directional cell. We suggest that even a small bias averaged over a large population of directional cells should result in a directional neuronal signal that would be indistinguishable from a response to an actual moving stimulus. Although the congruency indices were small for any one element pair, the pattern of elements in Rotating Snakes is repetitive, and our results indicate that every pair of elements in the continuous pattern contributes a signal that mimics a consistent direction of motion. Single forward-phi or reverse-phi signals are each difficult to see in isolation, but combined in a directionally consistent manner, these signals can generate a powerful impression of continuous unidirectional motion (Anstis and Rogers, 1986). In addition, the potent illusory motion of Rotating Snakes may reflect not only the cumulative sensitivity of cells in V1 and MT to the motion signals elicited by the individual element pairs, which we have shown to be the building blocks of the illusion but also the sensitivity to rotary motion of cells in other areas, such as the dorsal region of the medial superior temporal area (Saito et al., 1986).
That MT cells showed a more robust physiological correlate of this motion illusion than V1 cells does not indicate that the basis for the illusion arises in MT rather than in V1. In fact, the most directional V1 cells showed responses consistent with the illusion, and MT cells receive input from the most directional V1 cells (Movshon and Newsome, 1996). Moreover, because the illusion (and directional cells in V1 and MT) shows reverse-phi, which is a characteristic of first-order motion signals (Braddick, 1980; Lu and Sperling, 1995), the underlying motion signals must arise in cells at or before the simple-cell stage in V1 (Livingstone et al., 2001; Livingstone and Conway, 2003). Presumably, then, the basis for the illusion is in V1 but becomes more evident when signals are pooled in MT, just as the illusion becomes stronger when the basic element pairs are repeated throughout the visual field (Fig. 1).
Our psychophysical and physiological findings indicate that timing differences between responses to different contrast elements can account for the illusory motion observed in Rotating Snakes and provide the first evidence that these direction signals arise in direction-selective neurons in V1. All four adjacent element pairs in the illusion generate a motion signal in the same direction, which partly explains why the illusion is so powerful. In this sense, it is a static analog of four-stroke apparent motion (Anstis and Rogers, 1986).
Footnotes
This work was supported by National Eye Institute Grants EY13135 (M.S.L.), EY11379 (Richard Born), and EY12196, National Science Foundation Grant BCS-0235398 (C.C.P.), the Japan Society for the Promotion of Science (A.K.), Office of Naval Research Grant N00014-01-1-0624 (which partly supported A.Y.), the Harvard Society of Fellows, and the Harvard Milton Fund (B.R.C.). We are grateful to Dr. Richard Born, who provided two of the monkeys used in this study.
Correspondence should be addressed to Dr. Bevil R. Conway, Department of Neurobiology, Harvard Medical School, 220 Longwood Avenue, Boston, MA 02115. E-mail: bconway{at}hms.harvard.edu.
Copyright © 2005 Society for Neuroscience 0270-6474/05/255651-06$15.00/0