Abstract
Studying the mismatch between perception and reality helps us better understand the constructive nature of the visual brain. The Pinna–Brelstaff motion illusion is a compelling example illustrating how a complex moving pattern can generate an illusory motion perception. When an observer moves toward (expansion) or away (contraction) from the Pinna–Brelstaff figure, the figure appears to rotate. The neural mechanisms underlying the illusory complex-flow motion of rotation, expansion, and contraction remain unknown. We studied this question at both perceptual and neuronal levels in behaving male macaques by using carefully parametrized Pinna–Brelstaff figures that induce the above motion illusions. We first demonstrate that macaques perceive illusory motion in a manner similar to that of human observers. Neurophysiological recordings were subsequently performed in the middle temporal area (MT) and the dorsal portion of the medial superior temporal area (MSTd). We find that subgroups of MSTd neurons encoding a particular global pattern of real complex-flow motion (rotation, expansion, contraction) also represent illusory motion patterns of the same class. They require an extra 15 ms to reliably discriminate the illusion. In contrast, MT neurons encode both real and illusory local motions with similar temporal delays. These findings reveal that illusory complex-flow motion is first represented in MSTd by the same neurons that normally encode real complex-flow motion. However, the extraction of global illusory motion in MSTd from other classes of real complex-flow motion requires extra processing time. Our study illustrates a cascaded integration mechanism from MT to MSTd underlying the transformation from external physical to internal nonveridical flow-motion perception.
SIGNIFICANCE STATEMENT The neural basis of the transformation from objective reality to illusory percepts of rotation, expansion, and contraction remains unknown. We demonstrate psychophysically that macaques perceive these illusory complex-flow motions in a manner similar to that of human observers. At the neural level, we show that medial superior temporal (MSTd) neurons represent illusory flow motions as if they were real by globally integrating middle temporal area (MT) local motion signals. Furthermore, while MT neurons reliably encode real and illusory local motions with similar temporal delays, MSTd neurons take a significantly longer time to process the signals associated with illusory percepts. Our work extends previous complex-flow motion studies by providing the first detailed analysis of the neuron-specific mechanisms underlying complex forms of illusory motion integration from MT to MSTd.
Introduction
Global complex-flow motion patterns such as rotation, radial expansion and contraction, and spiral motions are crucial for navigating through the external world (Gibson, 1950). Motion perception and integration have been extensively studied both psychophysically and physiologically. For example, in physiology, motion integration has been carefully studied in primary visual cortex (V1) and middle temporal area (MT) using plaid stimuli (Movshon et al., 1983; Uwe and Guillaume, 2010), and end-stopped neurons in V1 (Pack et al., 2003) and most MT neurons (Pack and Born, 2001) are found to be capable of resolving the aperture problem during motion signal integration (Bradley and Goyal, 2008). Although translational direction signals are encoded in V1 (Hubel and Wiesel, 1968), MT (also known as V5; Zeki, 1974; Maunsell and Van Essen, 1983; Albright et al., 1984), and medial superior temporal area (MST; Zeki, 1980; Tanaka et al., 1986), neural correlates of global complex-flow motion are first encountered in the dorsal portion of MST (MSTd; Saito et al., 1986; Graziano et al., 1994; Lagae et al., 1994; Smith et al., 2006; Britten, 2008). MSTd neurons have been hypothesized to integrate local translational motion signals from early visual cortices into global complex-flow motion perceptions (Wurtz and Duffy, 1992; Warren and Saunders, 1994; Royden, 2002; Layton et al., 2012; Mineault et al., 2012; Layton and Fajen, 2016; Yu et al., 2018); however, the details remain elusive.
Visual illusions have fascinated mankind for thousands of years, and as the Czech physiologist Jan Purkinje remarked 150 years ago, “illusions contain visual truth.” The Pinna–Brelstaff figure (Fig. 1A) induces a striking example of illusory complex-flow motion perception (Pinna and Brelstaff, 2000). Illusory clockwise (CW) and counterclockwise (CCW) rotations are vividly perceived upon approaching or receding from the concentric rings of the Pinna–Brelstaff figure (Fig. 1B, left). Additionally, illusory expansion and contraction are also perceived during real CW or CCW rotation of the figure (Fig. 1B, right). The strength of these illusions and their motion directions (CW vs CCW rotation and expansion versus contraction) critically depends on the shape and arrangement of the local micropatterns such as the orientation and edge polarity of the small rhombi (Pinna and Brelstaff, 2000). The biased local motion caused by the aperture effect within the micropatterns of the Pinna–Brelstaff figure is presumed to be responsible for the generation of illusory motion (Gurnsey et al., 2002; Gurnsey and Pagé, 2006). This makes the Pinna–Brelstaff figure an ideal stimulus with which to study the neural mechanisms underlying the transformation from real to illusory motion (Fig. 1B). Using human fMRI, we have identified the cortical locus that represents the perception of Pinna illusory rotation (Pan et al., 2016; Wang et al., 2018). In these initial studies, the MST subregion of the human MT complex (+hMT) is most significantly correlated with illusory rotation. However, fMRI is unable to reveal the exact cell-specific mechanisms that underlie illusory complex-flow motion perception or its time course.
We therefore undertook recordings in the awake macaque monkey to probe both the psychophysical and electrophysiological responses of MSTd and MT to Pinna–Brelstaff figures. To test whether macaques can perceive the Pinna illusion, we used a carefully parametrized Pinna–Brelstaff figure composed of oriented Gabors (Fig. 1C; Gurnsey et al., 2002). We first performed equivalent psychophysical discrimination tasks on both human and nonhuman primates (Fig. 1D) and obtained comparable psychometric functions for both. Physiologically, we demonstrate a bottom-up integrative neural mechanism between MT and MSTd underlying the perception of illusory global flow motion. Specifically, subgroups of MSTd neurons represent the same classes of complex-flow motions regardless of whether they are real or illusory, yet the representation of illusory motion is temporally delayed when compared with the same class of real motion in MSTd.
Materials and Methods
Ethical approval
Human subjects gave written consent to the procedure in accordance with institutional guidelines and the Declaration of Helsinki, and the experimental procedures were approved by the Biomedical Research Ethics Committee of Shanghai Institutes for Biological Sciences (No. ER-SIBS-221305). All subjects had normal or corrected-to-normal vision and had no history of psychiatric or neurological disorders. All primate experimental procedures were approved by the Animal Care and Use Committee of the Institute of Neuroscience and by the local ethical review committee of the Shanghai Institutes for Biological Sciences (No. ER-SIBS-221204P). All experimental procedures were also in accordance with the National Institutes of Health Guide for the Care and Use of Laboratory Animals.
Psychophysical experiments on human subjects
Human subjects.
A total of nine human subjects were recruited for this study, including seven males and two females. All subjects had normal or corrected-to-normal vision and ranged in age from 22 to 30 years.
Visual stimuli.
We used an optimized stimulus version of the Pinna–Brelstaff illusion (Gurnsey et al., 2002), consisting of 10 concentric rings formed by symmetric Gabor patches of the same orientation relative to the radial axis. This is different from the original Pinna stimulus shown in Figure 1, where adjacent rings have opposite tilt. Each concentric ring was composed of 25 Gabor patches and was scaled corresponding to its retinal eccentricity. The origin of radial and rotary motion was always at the center of the visual display. Three classes of stimulus patterns were generated using relative Gabor orientations of +45°, 0°, and −45° (Fig. 1C), and with the Gabor width of 1.5 periods of a sinewave grating. Stimulus patterns were presented on a medium gray background. All stimuli were generated with MATLAB (MathWorks; RRID:SCR_001622), running the Psychophysics toolbox (RRID:SCR_002881, Kleiner et al., 2007). They were presented on a CRT monitor (model CPD-G520, SONY) with a refresh rate of 100 Hz. The gamma value of the monitor was calibrated using a ColorCal Photometer (Cambridge Research Systems). The distance between the subject's eye and the screen was 57 cm, resulting in a visual angle of 30° (height) × 40° (width).
Procedure.
Subjects were seated with their head stabilized by a forehead–chin rest and were asked to maintain fixation on a red spot presented at the center of the CRT and to respond by tapping the arrow keys on a standard computer keyboard. Two different psychophysical experiments were used. In the first, we used a three-alternative forced-choice (3AFC) paradigm (Fig. 1D, top), where subjects had to report the class of illusory complex-flow motion regardless of the real motion.
Pinna–Brelstaff figures made up of Gabor orientations of ±45° generate the perception of different classes of illusory motion when physically moved, while Pinna–Brelstaff figures made up of 0° orientation Gabors orientations produce no illusions (Fig. 2). The angular speed for rotation was fixed at ±30°/s (negative, CCW rotation; positive, CW rotation), while radial (linear) speed was fixed at ±5°/s (negative, contraction; positive, expansion). Only one of these four motion conditions was shown for each trial. After 1.5 s of stimulus presentation, a choice panel with three different motion patterns was presented (for radial motion trials these were as follows: CCW, stationary, and CW; and for rotation: contracting, stationary, and expanding), and subjects were instructed to match the perceived illusory motion to the corresponding choice pattern. There were 12 conditions (three Pinna–Brelstaff figures × four motion conditions) in one block, and each condition was repeated 10 times. The total number of trials was therefore 120, with each trial randomly presented in each block. In the second paradigm, the salience of the illusory complex-flow motion was quantified using a two-alternative forced-choice (2AFC) task (Fig. 1D, bottom). To measure the strength of illusory rotary motion, radial motion (inducing stimulus) was fixed at a speed of 5°/s or −5°/s, while the speed of the real nulling rotary motion was varied between −30°/s and 30°/s. When the strength of the illusory radial motion was measured, the speed of rotary motion (inducing stimulus) was fixed at 30°/s or −30°/s, while the speed of the real nulling radial motion was varied between −5°/s and 5°/s. Please note that the large difference between the values of rotary and radial speed is due to the different measurement criteria, as follows: radial speed is denoted by linear speed, and rotary speed is denoted by angular speed; an angular speed of 30°/s corresponds to an average linear speed of ∼5.5°/s. The intervals of the various speeds of rotary and radial motion were determined using a Bayesian adaptive staircase method (psi-marginal method; Prins, 2013) calculated using the Palamedes toolbox for MATLAB (RRID:SCR_006521; Prins and Kingdom, 2018). Bayesian adaptive staircase methods optimize the sampling and estimation of both the psychometric threshold and slope, including the subject responses into a prior distribution that affects subsequent values tested. The adaptive sampling places more trials at more informative speed values, and the circle size in Figure 3, A, B, D, and E, reflects the number of repeated trials at that particular value.
Data analysis.
For the 3AFC paradigm, the choice proportion (percentage) for illusory motion was calculated for each of the 12 conditions. For each motion pattern, the choice proportion was obtained by dividing the number of times that motion pattern was selected by the overall number of trials of the corresponding condition. For the 2AFC paradigm, we separately calculated the choice proportion for clockwise rotation in response to two radial motion conditions, and the choice proportion for expansion in response to two rotary motion conditions. The results were fitted using the Palamedes toolbox with a logistic psychometric function (PF) of the following form:
where α corresponds to the threshold FL(x = α) = 0.5, and β determines the slope of the PF.
Optimization of the PF was done using maximum likelihood of the following form:
where p(yk|xk; a, b) is the probability of observing response y (in our 2AFC task, the response are typically “correct” or “incorrect”) on trial k given stimulus intensity xk and assuming threshold α = a and slope β = b of the PF.
The point of subjective equivalence (PSE) was then derived from the fitted curves. The PSE represents the speed of motion that eliminates the opposing illusory motion, resulting in the subject responding at chance (50% choice proportion) since no motion can be perceived. For the control condition, the PSE motion value should be zero since no illusory motion is induced and subjects should fail to see the real motion only when its speed is zero. PSE value differences (Δ PSEs) were obtained by subtracting the PSE values for the ±45° conditions from the PSE values for the 0° control condition.
Psychophysical experiments on nonhuman primates
Nonhuman primates.
Two male rhesus monkeys (Macaca mulatta), weighing 6–8 kg, participated in the experiments. No power calculations, sample exclusions, blinding, or randomization were performed.
Visual stimuli.
Visual stimuli were identical to those described for the human psychophysical experiments.
Animal preparation.
A head post made of titanium alloy was initially implanted under sterile conditions for head stabilization of the rhesus macaques. The eye position of the monkey on the screen was measured using an EyeLink 2000 Eye Tracker (SR Research; RRID:SCR_009602).
Procedure.
Monkeys were initially trained using the 0° Pinna–Brelstaff figure (which elicits no illusory motion), until they could reliably distinguish between different kinds of real complex-flow motions. Once their performance was stable, we moved to the 2AFC paradigm. To measure the strength of illusory rotation, the speed of real nulling rotation was varied from −50° to 50°/s in 10 steps (detailed values: −50, −30, −15, −10, −5, 5, 10, 15, 30, and 50°/s), while the speed of real radial motion was fixed at ±5°/s. To measure the strength of illusory radial motion, the speed of real nulling radial motion was varied from −5° to 5°/s in 10 steps (detailed values: −5.0, −3.0, −2.0, −1.0, −0.5, 0.5, 1.0, 2.0, 3.0, and 5.0°/s), while the speed of real rotation was fixed at ±50°/s. Choice stimuli illustrating contraction and expansion were shown after the presentation of the fixed rotary motion conditions, whereas choice stimuli rotating in a CCW or CW direction were shown after the presentation of fixed radial motion conditions. Monkeys needed to saccade to the choice target that matched the real complex-flow motion pattern to receive a reward. There were 30 conditions in one block, and each condition was repeated 15 times, so the total number of trials was 450. Each trial was randomly presented in one block.
Data analysis.
The procedure was identical to that used for the human results.
Neural physiological experiments on nonhuman primates
Animal preparation.
A plastic polyetheretherketone (PEEK) chamber was implanted in a position determined for each animal based on their MRI scan. After 1 month of recovery from surgery, monkeys were trained to perform passive fixation tasks, during which a random dot field and Pinna–Brelstaff stimuli alternated. The monkey's eye position on the screen was measured using an EyeLink 2000 Eye Tracker (SR Research; RRID:SCR_009602). The monkey was rewarded with a drop of water, when fixation was maintained within a 1° radius window throughout a 2 s trial (a 300 ms blank followed by 1500 ms stimulus presentation and a 200 ms blank).
Electrophysiological neural recordings.
Neural responses were recorded using tungsten electrodes (Microprobes for Life Science), and spike signals were collected by an OmniPlex D Neural Data Acquisition System (Plexon; RRID:SCR_014803). Spikes were sorted, and the unit types (single unit or multiunit) were identified using Offline Sorter software (Plexon; RRID:SCR_000012). MT and MSTd were first localized by using the MRI scan with an electrode in situ (Fig. 4A, top). We used pyElectrode software (Daye et al., 2013), combined with the receptive field (RF) response properties of the recorded units, for confirmation of individual penetration location (Fig. 4A, bottom, B,C). Both MT and MSTd are highly selective for motion, but differ in the following three ways: (1) The RF size of MSTd is much larger than the RF of MT (Tanaka et al., 1986); (2) MT units exhibit a strong linear relationship between retinal eccentricity and RF size, whereas no clear correlation has been found for MSTd units (Tanaka et al., 1986); and (3) MT cells respond strongly to the direction of linear motion, but show little or no preference for expansion, contraction, and rotation. Within MSTd, a subgroup of cells responds to complex motion patterns and in some cases responds exclusively to complex-flow motions (Saito et al., 1986).
After isolating a unit, the RF was hand mapped using computer-generated random-dot fields with translational, radial, and rotary motions. For an identified MSTd unit, we used a random-dot field (dot size, 0.3°; dot density, 1 dot/°2) to test its linear motion directional tuning and/or preferred complex-flow motion pattern. The strength of direction or complex-flow motion tuning was measured using a direction discrimination index (DDI; Fetsch et al., 2007). Only units with a DDI of >0.5 for complex-flow motion tuning were selected for further study. For MT units, we used the same procedures except that we used a DDI of >0.5 in response to translational motion stimuli. The three types of Pinna–Brelstaff stimuli were presented to all selected units. Stimuli were centered at the middle of the screen, regardless of the location of the RFs, because the perceptual illusion in humans does not occur when the origin of motion is peripherally placed. MT units with strong surround suppression were excluded from testing, because the response was inhibited when a full-screen Pinna—Brelstaff figure was presented. A subgroup of MT units was tested with masked Pinna–Brelstaff stimuli restricted to their RFs.
Visual stimuli.
The Pinna–Brelstaff stimuli used in the electrophysiological recordings were the same as those used in the psychophysical experiments. The use of full screen stimuli maximized the neural responses in both MT and MST. We showed patterns with pure radial or rotary motion, keeping the speeds constant (radial motion, ±5°/s; rotary motion, ±30°/s) to generate the poststimulus response histogram (PSTH) responses detailed in Figure 5 (each condition was repeated 5–10 times depending on the recording stability). For generation of the polar plots in Figure 6, the following radial speeds were presented: 5, 3, 1, 0, −1, −3, −5, −3, −1, 0, 1, and 3°/s for each condition. The rotary speeds were as follows: 0, −10, −20, −30, −20, −10, 0, 10, 20, 30, 20, and 10°/s. These two motion parameters were combined to create complex-flow motion patterns for each trial (Fig. 6A; each condition was repeated 5–10 times depending on the recording stability). Every block of trials contained a blank condition displaying a uniform median gray background, serving as a baseline for subsequent analyses. The total ocular fixation time was 2000 ms, with a 300 ms prestimulus time, a 1500 ms stimulus-presenting time, and a 200 ms poststimulus time. During the prestimulus and poststimulus periods, no stimulus was shown on the screen, only a median gray background. Stimuli were presented on a gamma-corrected CRT monitor (model P1230, HP) with a refresh rate of 100 Hz. The distance between the eyes of the monkey and the screen was 57 cm; the size of the screen was 1600 × 1200 pixels, subtending a visual angle of 30° × 40°.
Data presentation and analysis.
Unit responses obtained using pure radial or rotary motion were illustrated using raster and PSTH plots. The bin width for the PSTH was 10 ms, the moving time window was 10 ms, and time ranged from −200 ms (negative means the time before stimulus presentation) to 1800 ms, covering the whole length of the trial. For each unit, there were a total of four facilitative and suppressive response cases: one facilitative and one suppressive case for the +45° condition; one facilitative and one suppressive case for the −45° condition (Fig. 5A,B), and the N for the firing rate scatter plot represents the number of cases.
The reliability of discriminating illusory and real motion was measured using receiver operating characteristic (ROC) analysis (Green and Swets, 1966; Britten et al., 1992, 1996; Celebrini and Newsome, 1994; Price and Born, 2010). ROC analysis compares two conditions (one containing the signal, and the other the no-signal condition), and uses the firing rates of every trial in each condition to construct a proportional curve. Displacements of the ROC curve above the equal-value line indicate improved detectability of the stimulus, and this is quantified by measuring the area under the ROC curve (AUC). For illusory motion, ROC curves were derived by comparing the firing rate of ±45° conditions with the 0° control condition; for real motion, curves were derived by comparing the firing rate of the 0° control condition with the blank condition.
For the polar plot analysis, different axes represent different combinations of complex-flow motions. Pure radial and rotary motions are located on the cardinal axes, where 0°, 90°, 180°, and 270° refer to pure expansion, CCW rotation, contraction, and CW rotation, respectively. The angles falling in between the cardinal axes constitute the weighted combinations of two neighboring pure complex-flow motions; for example, 30° represents spiral motion with expansion and a little CCW rotation, while 60° represents more CCW rotation and less expansion (Graziano et al., 1994; Heuer and Britten, 2004). For each unit, there are two response cases, one for the +45° condition and the other one for the −45° condition.
Predicted polar plot tuning functions.
Based on the patterns of facilitative and suppressive responses estimated from the PSTHs, we created a simple qualitative prediction of the polar tuning curve shifts for the three Pinna–Brelstaff conditions. Taking a CCW rotation-sensitive unit as an example, for the +45° condition, real expansion (inducing CCW illusion; Fig. 2B) will facilitate the response; whereas, real contraction (inducing CW illusion) will suppress it; this will shift the curve rightward toward the expansion axis. Because spiral 1 motion contains expansion and spiral 2 contains contraction (Fig. 6A), this will cause a similar increase and decrease in response. This overall prediction is shown in the magenta curve of Figure 6B (left). Due to the opposite illusory effects seen for the −45° condition (Fig. 2B), a leftward tuning shift toward the contraction axis should be observed (Fig. 6B, left, green curve). This logic also applies to the other radial and rotatory complex-flow motion patterns (Fig. 6C).
Spiral motion-sensitive units respond to the combination of radial motion and rotation, and their model predictions are different. Taking a spiral 2 (combined contraction and CCW rotation)-sensitive unit as an example, for the +45° condition, CCW rotation (inducing expansion illusion) and contraction (inducing CW illusion) will suppress the response and reduce the amplitude (Fig. 6D, left, magenta curve). Due to the opposite illusory effects seen for the −45° condition, a facilitation of the response and amplitude increase will be observed (Fig. 6D, left, green curve). This logic also applies to the other spiral complex-flow motion patterns.
One should notice that all model predictions are qualitatively measured. Both the bandwidth (selectivity) of the tuning preference and the differences in the speed tuning properties of each cell influence the final tuning curve response profile in a way that is not incorporated into the qualitative model. Additionally, the spiral motion group can also exhibit predictable angle shifts when their preferences are biased toward purely radial or rotary motion. Radial and rotary motion groups can also exhibit predictable amplitude differences when they show a slight response to each other. Such effects were small and variable; therefore, we divided MSTd units into three groups (radial, rotary, and spiral motion groups; Fig. 6A) and used Δ preferred angle or Δ amplitude parameters to measure the illusion mediating the properties for each group.
Both the temporal ROC and onset/peak firing rate time analyses were performed from the raw spike train using a 20 ms centered boxcar function (noncausal moving average) shifted with a 1 ms step. Only cases with significant responses (for illusory motion conditions, the significance was measured by comparing the firing rates of illusory stimuli and control stimuli; for real motion conditions, the significance was measured by comparing the firing rates of physical stimuli and blank stimuli) were selected for analysis. For the population tuning, the ±45° conditions were combined. The response onset and peak latencies were calculated by finding the time points at which the firing rate is larger than the baseline response. For the illusory motion conditions (which contain both an illusory motion component and the real motion component that induces the illusory motion), the baseline response was obtained from computing the 99% bootstrap confidence interval of the statistic computed by the mean of the control condition (which only contains a real motion component) response; for real motion conditions, the baseline was obtained from computing the 99% bootstrap confidence interval of the statistic computed by the mean of the blank condition (which provides a spontaneous response). The response onset latency was determined as the first time point at which the response was larger than the baseline, and the peak latency was determined as the time point at which the response became maximum after the response onset. For temporal ROC analysis, AUC values were calculated for each time bin, the ROC latencies were determined by the first time point at which the AUC values were significantly > 0.5 baseline (with the subsequent 10 bins also showing significantly larger values); significance was determined at p < 0.01 after Bonferroni correction. The AUC analysis was performed for both the population and single neurons. For the population, we first generated the averaged temporal curve from the raw spike trains of all the included cases, then calculated the ROC latencies. For single neurons, we first calculated the ROC latencies for each case, then performed our statistical comparison.
For both MT and MSTd units, the tuning preference for complex-flow motion was measured based on the 0° Pinna–Brelstaff control condition (where no illusory motion can be generated under any physical motion manipulation; Fig. 2B). Since there may be differences of the tuning properties when estimated using gratings or random dots (Albright, 1984; Wang and Movshon, 2016), we only use them as an additional reconfirmation of the tuning preference of the neurons.
Experimental design and statistical analysis
All experimental designs including psychophysical and physiological research paradigms are described above. For the analysis of the human psychophysical experiments, all nine subjects were included. Each subject's results contain three PSE values, which represent +45°, −45°, and 0° control conditions. Differences between PSE values were compared with the 0°/s baseline using a Tukey-Kramer post hoc corrected one-way ANOVA. For the nonhuman primate psychophysical experiments, data were collected continuously for 7 d from two Macaque monkeys and statistical analysis was performed separately on each monkey. Differences between PSE values were compared with the 0°/s baseline using a Tukey-Kramer post hoc corrected one-way ANOVA. For electrophysiological recordings, the linear relationship and significance between receptive field size and retinal eccentricity was tested using the Spearman rank correlation. The significance of the facilitative and suppressive responses was calculated using the two-tailed Wilcoxon signed-rank test, which is a nonparametric test for two populations when the observations are paired. The significance of the preferred polar angle and amplitude differences between illusory and control conditions were tested using (nonparametric) permutation analysis. Specifically, we shuffled the trials from both illusory (i.e., five trials for each group) and control (i.e., five trials for each group) conditions. The shuffled trials were then drawn randomly into two new groups with an identical number of trials as in the original groups. Two new tuning curves using data shuffled between the two conditions were generated, and the differences in preferred angle or amplitude were computed. This process was repeated 5000 times to generate a permuted distribution, from which a p value was calculated from the proportion of Δ values that were larger than the original nonpermuted values (Good, 2005). The distribution biases for polar angle and amplitude differences away from zero, were tested with the Wilcoxon signed-rank test. The same test was also used in MT for comparing unmasked with masked responses. The Wilcoxon rank sum test was used for the comparison between illusory and real motion responses (both Δ firing rate and AUC values) in the same brain area, the same test was also used for the comparison between MT and MSTd responding to the same motion patterns (either real or illusory). The Wilcoxon rank sum test is a nonparametric test for two populations when samples are independent. The Wilcoxon rank sum test was also used for the comparison between MT and MSTd onset/peak latencies. The normality of the distribution for population values was tested using the Kolmogorov–Smirnov test, and nonparametric statistical methods were used to analyze population data that were not normally distributed. For significance testing, results were considered to be different from each other, when the p values were <0.01, and multiple comparisons were Bonferroni corrected. All error bars were ±SEM. Tests were performed using the MATLAB statistics toolbox.
Results
Psychophysical measurement of the Pinna–Brelstaff illusion in human and macaque
Previous studies of the Pinna–Brelstaff figure (Bayerl and Neumann, 2002; Gurnsey and Pagé, 2006; Pan et al., 2016) have focused on the perception of illusory rotation elicited by real expansion and contraction of the stimulus (Fig. 1B, left). By comparison, the perception of illusory radial expansion or contraction elicited by rotation of the figure has rarely been addressed (Fig. 1B, right). Here we examined both of these illusory effects in nine human subjects and two rhesus monkeys. We created Pinna–Brelstaff figures that varied only in the relative orientation of the local Gabor patches (+45° and −45°), producing illusions of opposite directions (Fig. 1C; Gurnsey et al., 2002). To confirm that the variation of the Pinna–Brelstaff figures generates complex-flow motion perception for both rotary and radial illusions, and to map the correspondence between the relative orientation of the Gabor and the perceptual outcome, we first ran a 3AFC detection task (Fig. 1D, top; Kingdom and Prins, 2016) on human subjects. They were asked to report whether they did or did not perceive illusory motion and, if they did, in which direction. None of our subjects reported illusory motion under any kind of real motion for the 0° control stimulus (Fig. 2A, middle column). By comparison, each of the ±45° tilted Gabor stimuli generated opposite illusory flow motion perceptions (Fig. 2A, left and right columns). For example, real expansion of the Pinna–Brelstaff figures resulted in illusory CCW rotation for the +45° condition and illusory CW rotation for the −45° condition, whereas real contraction elicited CW and CCW illusory rotations for the +45° and −45° conditions, respectively. Likewise, real CW rotation of the Pinna–Brelstaff figures resulted in illusory contraction for the +45° condition and illusory expansion for the −45° condition (and again this pattern reversed for real CCW rotation of the figures). These results are summarized in Figure 2B.
Illustration of the Pinna–Brelstaff illusion, stimulus parameters, and the 3AFC/2AFC psychophysical paradigm. A, Classical Pinna–Brelstaff rotary illusion. B, Real radial (left) and rotary (right) motion of Pinna–Brelstaff figure produce illusory rotary and radial motions, respectively. C, Illustration of Pinna–Brelstaff figures used in this study, whose local Gabor orientation varied between +45°, 0°, and −45°. D, Schematic illustration of psychophysical 3AFC detection tasks used for human subjects (top) and 2AFC discrimination tasks used for both human subjects and macaque monkeys (bottom).
Illusory flow motion percepts during different physical manipulations of Pinna–Brelstaff figures. A, The results of the 3AFC detection tasks to Pinna Illusory complex-flow motions in human subjects. Each row represents a physical manipulation of Pinna–Brelstaff figures, and each column represents a stimulus condition (+45°, 0°, −45°). The x-axis delineates the three motion choices. The illusory motion patterns are illustrated by orange icons on the x-axis, and No illusion was marked as “No.” The y-axis shows the proportion (percentage) for each perceptual choice. During real radial motion condition, the three options were CCW illusory rotation, No illusion, and CW illusory rotation. For the real rotary motion condition, the three options were contraction illusion, no illusion, and expansion illusion. B, Summary of the perceived illusory complex-flow motions under different kinds of physical motion manipulations for each stimulus condition (+45°, 0°, −45°).
Based on the subjects' perceptual responses to the three stimulus conditions (+45°, 0°, −45°; Fig. 2B), we then quantified the magnitude of the illusory motion, using a 2AFC Class A direction-nulling discrimination procedure (Brindley, 1970; Kingdom and Prins, 2016; Fig. 1D, bottom; see Materials and Methods). In this task, subjects were instructed to report the perceived direction of rotary or radial motion of a given stimulus figure as a function of the rotary or radial speed physically superimposed onto the illusory stimulus motion. For example, in Figure 3A, the radial expansion stimulus had a fixed radial speed of 5°/s (linear speed), which elicited clear illusory rotation. The speeds of the real rotary motion superimposed onto the illusory rotation are plotted on the abscissa (positive, CW rotation; negative, CCW rotation). The illusory rotation induced by the real expansion stimulus was either strengthened or reduced by the addition of real nulling rotation, leading to a rightward or leftward shift of the psychometric functions. The shifts were dependent on the angle of the tilted Gabor orientation (Fig. 3A, magenta and green curves). We were interested in the speed at which the illusory CW and CCW rotations were cancelled (nulled) by the superimposed real rotation and could no longer be discriminated from each other (50% on the ordinate). We used a modified Bayesian adaptive staircase procedure (Kontsevich and Tyler, 1999; Prins, 2013) to optimally estimate this speed, and the corresponding point on the abscissa is called the PSE. This is a measure of the strength of the induced illusory rotation.
The results of 2AFC psychophysical experiments on human subjects and macaque monkeys. A, B, Examples of psychometric functions obtained from human subject LF when tested for illusory rotary and radial motion with the real expansion and CW rotation tasks. Green, black, and magenta represent −45°, 0°, and +45° tilted Gabors. Positive and negative values on the x-axis represent the speed of different types of flow motion patterns as indicated underneath, while the circles show the response frequency as a function of physical speed, and the circle sizes represent the repeat times of that speed condition through using the staircase method (see Materials and Methods). C, Box plots show the distributions of individual Δ PSE values from all nine subjects. Real expansion condition on the left and real CW rotation condition on the right. D, E, Examples of psychometric functions from human subject LF, testing perception of illusory rotation using real contraction, and illusory radial motion using real CCW rotation. Same conventions as in A and B. F, Box plots show the distribution of individual Δ PSE values across all nine subjects. Real contraction condition on the left and real CCW rotation condition on the right. G, H, Examples of psychometric functions from a single day of testing in monkey WJ using the same physical manipulations of the same Pinna–Brelstaff figures as those in A and B. I, Box plots show the distributions of Δ PSE values over 7 d from two monkeys, WJ and DX. J, K, Examples of psychometric functions for monkey WJ obtained from a single day with the same physical manipulation of the same Pinna–Brelstaff figures as those in A and B. L, Box plots show the distribution of Δ PSE values over 7 d from two monkeys. Asterisks denote statistical significance at **p < 0.01; ***p < 0.001.
The PSE values were similarly determined for the rotary stimulus (Fig. 3B) eliciting illusory radial expansion and contraction. Here, the real rotary speed was fixed at 30°/s of angular speed (corresponding to a mean linear speed of ∼5.5°/s). Real nulling radial motion (positive, expansion; negative, contraction) was then superimposed onto the illusory radial motion.
We quantitatively define the strength of the illusory effects by plotting the PSE differences between these conditions (Δ PSE). The mean Δ PSE for the ±45° conditions calculated across all nine human observers differ significantly from the control values for both induced illusory rotary and radial motion (Fig. 3C, left: F(2,24) = 59.46; p = 4.65 × 10−5 for −45°; p = 3.17 × 10−5 for +45°; Fig. 3C, right: F(2,24) = 76.45; p = 1.81 × 10−5 for −45°; p = 2.31 × 10−6 for +45°; Tukey-Kramer post hoc corrected one-way ANOVA).
Figure 3D–F shows similar results for contraction and CCW rotation tasks (Fig. 3F, left: F(2,24) = 61.01; p = 2.77 × 10−7 for −45°; p = 4.7 × 10−4 for +45°; Fig. 3F, right: F(2,24) = 37.5; p = 0.0013 for −45°; p = 3.31 × 10−4 for +45°; Tukey-Kramer post hoc corrected one-way ANOVA). These results demonstrate that the Pinna–Brelstaff figures produce robust and, more importantly, predictable illusory effects, consistent with the results of previous psychophysical studies (Pinna and Brelstaff, 2000; Bayerl and Neumann, 2002; Gurnsey et al., 2002; Gurnsey and Pagé, 2006).
We performed similar psychophysical tasks in two monkeys (Fig. 3G–L). The animals were trained to report the motion directions of the stimulus (i.e., CW vs CCW rotation; Fig. 3G) and expansion versus contraction (Fig. 3H). Note that the monkeys reported only what they perceived as they did not know which trials contained the illusory motion conditions. The slopes of psychometric functions derived from the monkeys are shallower but exhibit similar shifts as those obtained from the human observers (see examples in Fig. 3G,H). Across sessions, the mean Δ PSE between ±45° and 0° conditions in both rotary and radial motion discrimination tasks is highly significant in each animal (Fig. 3I, left: F(2,39) = 74.88; −45°: monkey WJ, p = 5.43 × 10−9; monkey DX, p = 8.1 × 10−5; +45°: monkey WJ, p = 2.82 × 10−6; monkey DX, p = 4.76 × 10−5; Fig. 3I, right: F(2,39) = 84.77; −45°: monkey WJ, p = 1.03 × 10−6; monkey DX, p = 7.41 × 10−7; +45°: monkey WJ, p = 1.24 × 10−7; monkey DX, p = 9.82 × 10−5; Tukey-Kramer post hoc corrected one-way ANOVA). Figure 3J–L illustrates analogous results for the contraction and CCW rotation tasks (Fig. 3L, left: F(2,39) = 85.58; −45°: monkey WJ, p = 0.0027; monkey DX, p = 2.57 × 10−4; +45°: monkey WJ, p = 2.65 × 10−4; monkey DX, p = 2.06 × 10−4; Fig. 3L, right: F(2,39) = 32.04; −45°: monkey WJ, p = 2.87 × 10−7; monkey DX, p = 0.0049; +45°: monkey WJ, p = 1.98 × 10−5; monkey DX, p = 1.49 × 10−4; Tukey–Kramer post hoc corrected one-way ANOVA). In summary, these results show that, similar to human observers, nonhuman primates do perceive illusory motions in the Pinna–Brelstaff figures.
Neural responses of area MSTd to Pinna–Brelstaff figures
We next asked how neurons in areas MSTd and MT respond to Pinna–Brelstaff figures. We recorded from 312 well isolated single units in the same macaques that had performed the psychophysical measurements (MSTd, N = 192; MT, N = 120). MSTd and MT were identified both by anatomical reconstruction of the MRI scans (Fig. 4A) and the retinotopic organization of their neurons (Fig. 4B,C). In general, our MSTd neurons exhibited large RFs (mean ± SEM diameter, 30.85 ± 0.5°), whose visuotopic extent ranged from a hemifield to the whole visual display (Fig. 4B). The sizes of the RFs were uncorrelated with retinal eccentricity (r = −0.013, p = 0.87, Spearman rank correlation). In contrast, MT units usually had relatively smaller RFs that were confined to the contralateral visual field (Fig. 4C). Their RF sizes significantly increased as a function of eccentricity (r = 0.85, p = 1.75 × 10−34, Spearman rank correlation). These results are consistent with previous electrophysiological studies of these visual areas (Gattass and Gross, 1981; Van Essen et al., 1981; Desimone and Ungerleider, 1986; Tanaka et al., 1986).
The MRI scanning results and receptive field properties of MT and MSTd. A, Top row, MRI scanning results of one macaque monkey. The locations of chamber and electrode are shown in coronal and horizontal planes, and the penetrating guided grid is shown in the aerial view. Bottom row, Illustration of recording sites in the same hemisphere of the same monkey. Blue and orange dots, Electrode tip positions in MT and MSTd areas, respectively. B, Top, screen-scaled receptive field sizes and locations of MSTd neurons (gray rounded rectangles, RFs of neurons in the right hemisphere; black rounded rectangles, RFs of neurons in the left hemisphere). MST neurons have big receptive fields, which spread both ipsilaterally and contralaterally. Bottom, The relationship between retinal eccentricities and RF diameters. Diameters were plotted as a function of eccentricity. The RFs of MSTd neurons are nonretinotopically organized. C, Same conventions as in B, MT neurons have relatively smaller receptive fields, which are contralaterally distributed, and their RFs are retinotopically organized.
The first row on the top of Figure 5A illustrates the responses of an MSTd neuron to the 0° Pinna–Brelstaff control figure under two different complex-flow motion patterns, demonstrating that it favored real CCW rotation (Fig. 5A, third row, responses to real expansion/contraction). We also tested this neuron across three conditions (+45°, 0°, and −45°) using expanding and contracting Pinna–Brelstaff figures. The neural responses are illustrated by the PSTHs plots (Fig. 5A). Although neither of these radial motions were the preferred motion pattern for this neuron, the +45° and −45°conditions exhibited significantly enhanced responses (left magenta in the second row, p = 0; right green in the last row, p = 9.75 × 10−21; Wilcoxon signed-rank test), when compared with the 0° condition. Such facilitation of the neural response is consistent with the perceptual results that illusory CCW rotation is produced by real expansion of the +45°and contraction of the −45° Pinna–Brelstaff figure (Fig. 2B).
Results of single-unit recordings from area MSTd in response to physical manipulation of the Pinna–Brelstaff figures. A, Top row, The PSTHs and raster plots show the responses of a real CCW rotation-sensitive MSTd neuron to real rotary motion patterns tested with the 0° control condition. Horizontal black dashed lines, Spontaneous responses. The lower three rows: The PSTHs and raster plots show the same MSTd unit responding to real expansion (left) and contraction (right) of the Pinna–Brelstaff figures. Orange icons show the illusory motion predicted from the psychophysics (Fig. 2B), which corresponds to the facilitative responses of the neuron. B, A real contraction-sensitive MSTd neuron responding to the 0° control condition with radial motion patterns as well as real CW (left) and CCW (right) rotations of the Pinna–Brelstaff figures. Same conventions as in A. C, D, Log axis scatter plots of the distributions of facilitative (cases, N = 264) and suppressive (cases, N = 264) responses of MSTd neurons across all Pinna–Brelstaff stimuli. E, Log axis scatter plot showing the relationship between the strength of facilitative response (Δ facilitative response) and the strength of suppressive response (Δ suppressive response, positive values show more suppression). White circles represent cases without both significant facilitation and suppression; gray circles represent cases with either significant facilitation or suppression; black circles represent cases with both significant facilitation and suppression.
Figure 5B shows another MSTd neuron that prefers real contraction, as measured with the 0° Pinna–Brelstaff control figure. In the null direction (expansion), this neuron exhibits significant suppression (p = 0, Wilcoxon signed-rank test). Testing the ±45° conditions using real CW and CCW rotation elicited significantly enhanced responses (Fig. 5B, left, magenta: p = 0; right, green: p = 4.77 × 10−30; Wilcoxon signed-rank test), when compared with the 0° condition. In addition, real CCW rotation of the +45° figure (Fig. 5B, right, magenta) and CW rotation of the −45° figure (Fig. 5B, left, green) produced significant suppression (−45°, p = 0; +45°, p = 1.01 × 10−33; Wilcoxon signed-rank test). As expected, the facilitated and suppressed responses of this neuron are also consistent with the perceptual results of illusory contraction and expansion generated by the Pinna–Brelstaff figures (Fig. 2B).
We summarized the population results across MSTd units preferring rotary and radial motion (N = 132; Fig. 5C,D). Firing rates in response to the physical manipulation of the Pinna–Brelstaff figures were plotted for the ±45° condition (on the ordinate) against the control 0° condition (on the abscissa). Based on the observation in Figure 2, neurons whose preferred real motion patterns matched the illusory motion pattern were plotted in Figure 5C, while those matching the anti-preferred motion patterns of the neurons were plotted in Figure 5D. Each unit had two response cases for both preferred and antipreferred directions. For example, real expansion of the +45° condition and contraction of the −45° condition each contributes one case of predicted facilitation for the example unit in Figure 5A, and vice versa for suppression. Therefore, N represents cases, not the number of neurons. In Figure 5C, we found that the overall average response was significantly enhanced by 63.6% for both the ±45° conditions (cases: N = 264, Z = 13.27, p = 3.47 × 10−40, Wilcoxon signed-rank test). Specifically, nearly half of the cases showed significant facilitation (43.94%, 116 of 264 cases; permutation test, p < 0.01; Fig. 5C, filled symbols). In Figure 5D, the overall average response was significantly reduced by 25.4% (cases: N = 264, Z = −9.76, p = 1.67 × 10−22, Wilcoxon signed-rank test). Of those, 28.41% cases showed significant suppression (75 of 264 cases; permutation test, p < 0.01; Fig. 5D, filled symbols). We also compared the strength of facilitation and suppression for each neuron (Fig. 5E). The strength for both was calculated as the absolute difference between illusory conditions and the control condition (presented as Δ facilitative response and Δ suppressive response). The cases were divided into the following three groups depending on their significances of facilitation and suppression effects: cases without significant facilitation and suppression (cases: N = 114, 43.2%); cases with either significant facilitation or suppression (cases: N = 104, 39.4%); and cases with both significant facilitation and suppression (cases: N = 46, 17.4%). As expected, the number of significant cases increases as the Δ response strength for facilitation or suppression rises. We found that neurons with stronger facilitation to its preferred illusory motion pattern tend to have stronger suppression to its antipreferred illusory motion pattern, exhibiting a weak but significant positive linear correlation (r = 0.17, p = 0.0046, Spearman rank correlation). In addition, the strength for facilitation is significantly larger than for suppression (cases: N = 264, Z = 7.96, p = 1.67 × 10−15, Wilcoxon signed-rank test).
In summary, the facilitation and suppression in MSTd was consistent with psychophysical predictions (Fig. 2B), demonstrating comparable neural selectivity between an illusory complex-flow motion pattern (e.g., illusory CCW rotation) and its physical counterpart (e.g., real CCW rotation). We therefore conclude that a subgroup of MSTd neurons represent illusory complex-flow motions (rotary or radial) elicited by viewing the Pinna–Brelstaff stimulus, suggesting that a fraction of MSTd neurons may contribute directly to the perception of illusory motion.
Tuning properties of MSTd neurons to Pinna–Brelstaff figures
The above neural responses were examined for only one motion axis (i.e., radial or rotary motion). To assess the neural responses mediating illusory flow motions for a more complete stimulus range, including spiral motions (Graziano et al., 1994), we next plotted the data using a two-dimensional polar coordinate system as defined in previous studies (Graziano et al., 1994; Heuer and Britten, 2004; Xu et al., 2014). In these plots, the real radial (expansion and contraction) and rotary (CW and CCW) motions are represented on the horizontal and vertical meridians, respectively. In addition, real spiral motions, which are derived from the weighted combination of radial and rotary motions, are distributed with equal intervals between the two main axes (Fig. 6A). The gray dots in Figure 6A illustrate the distribution of the recorded MSTd neurons in their preferred real complex-flow motion patterns.
MSTd polar-plot tuning for complex-flow motion patterns. A, Canonical complex-flow motion patterns represented in polar coordinates, and the MSTd neuron population distribution of the preferred complex-flow motion pattern plotted against the normalized vector-summed firing. B, An MSTd neuron sensitive to CCW rotation. Left, Qualitatively predicted response tuning curves to physical manipulation of the Pinna–Brelstaff figures (green, −45°; gray, 0°; magenta, +45°). Middle, Neural responses to the Pinna–Brelstaff figures (error bars show the mean firing rate ± SEM), arrows represent the vector sums calculated from the tuning curve, and they agree with the predictions. Right, Response to moving random dot field (error bars show the mean firing rate ± SEM). C, A real expansion-sensitive MSTd neuron in response to the physical manipulation of the Pinna–Brelstaff figures. Same conventions as in B. D, An MSTd neuron sensitive to real spiral 2 motion in response to the physical manipulation of the Pinna–Brelstaff figures. Same conventions as in B.
For a hypothetical neuron sensitive to illusory complex-flow motion, the tuning curves for the ±45° conditions can be predicted to change when compared with the 0° control. For example, Figure 6B shows the polar tuning curves for an MSTd neuron that prefers real CCW rotation, as defined by the 0° control condition (Fig. 6B, middle, gray curve). For a CCW rotation-selective neuron, real expansion of the +45° condition should increase the neuronal response since this stimulus elicits illusory CCW rotation (Fig. 2B). This is reflected in Figure 6B (left, top right quadrant), where real expansion or expansion containing stimuli like spiral 1 motion (Fig. 6A) results in a rightward shift (firing rate increase) of the magenta curve (+45° condition) compared with the gray curve (0° control). In contrast, during real contraction of the +45° condition, the neural response should decrease because this stimulus elicits illusory CW rotation (Fig. 2B) that is the antipreferred motion for this neuron. This is reflected in the top left quadrant of the left panel in Figure 6B, where real contraction or contraction-containing stimuli like spiral 2 motion results in a rightward shift (firing rate decrease) of the magenta curve compared with the gray curve. Hence, the overall magenta tuning curve is expected to shift rightward (Fig. 6B, left, magenta vs gray curve). Compared with the +45° condition, the −45° condition elicits opposite illusory flow-motion perception under the same physical motion manipulation (Fig. 2B), so the facilitation and suppression effects are reversed and the tuning curve is expected to shift leftward (Fig. 6B, left, green vs gray curve). We calculated the polar direction for each condition tuning curve using vector summation, then measured the angular difference between the ±45° and 0° control (Δ preferred angle). Consistent with these predictions, the vector sum of the neuron (Fig. 6B, middle) is significantly shifted rightward by 38.4° for the +45° condition (Fig. 6B, middle, magenta arrow; p = 0, permutation test) and 75° leftward for the −45° condition (Fig. 6B, middle, green arrow; p = 0, permutation test). The right panel in Figure 6B shows the classical motion responses of the neuron to moving dot stimuli. Figure 6C presents tuning curves obtained from a real radial motion (expansion)-sensitive MSTd neuron. Using the same logic as that for the rotary motion preference neuron in Figure 6B, the tuning curves are predicted to shift upward and downward for the +45° and −45° conditions, when compared with the 0° control condition (Fig. 6C, left, green and magenta curves vs gray curve). Consistent with this prediction, the vector sum of this neuron is significantly shifted by 22.6° upward and 16.8° downward, respectively (+45°, p = 0; −45°, p = 0; permutation test).
The shift of the vector sum in the tuning curves is an efficient way to quantify the illusory effects of the Pinna–Brelstaff figures for neurons preferring radial and rotary motions. However, this is not applicable for neurons preferring spiral motion, whose responses can be influenced by both radial and rotary motions (Fig. 6A,D). As a result, their amplitudes instead of vector sums are expected to change (Fig. 6D, left; for a more detailed description, see Materials and Methods). An exemplar neuron preferring spiral motion is shown in Figure 6D (middle), and significant changes in the tuning amplitude (Δ amplitude) were indeed observed for the Pinna–Brelstaff figures (+45°: 12.31, p = 0; −45°: 16.17, p = 0.0002; permutation test).
We next examined the tuning curve changes across the population (rotary, N = 50; radial, N = 82; spiral, N = 60; Fig. 7). The sign of the Δ preferred angle and Δ amplitude was assigned such that positive values represent the Δ changes as predicted from the Pinna–Brelstaff illusory effects (Fig. 2B). Other Δ changes not in line with the prediction were given as negative values. For rotation preferring neurons (Fig. 7A), the overall distribution of Δ preferred angle is significantly shifted to the right (14.64 ± 1.66°, mean ± SEM; Z = 7.17, p = 7.75 × 10−13; cases, N = 100, Wilcoxon signed-rank test; N represents cases). Within the population, 35% (35 of 100 cases; permutation test, p < 0.01) individual cases exhibited significant shifts. Similarly, for radial motion-preferring neurons (Fig. 7B), the overall distribution is also significantly shifted rightward (19.32 ± 1.49°, mean ± SEM; Z = 10.09, p = 6.36 × 10−24; cases, N = 164, Wilcoxon signed-rank test). Again, 61.59% of cases (101 of 164 cases; permutation test, p < 0.01) show significant shifts. Finally, for the subpopulation preferring real spiral motion (Fig. 7C), the mean Δ amplitude is 4.27 ± 0.94 spikes/s, which is significantly larger than zero (Z = 4.88, p = 1.08 × 10−6; cases, N = 120, Wilcoxon signed-rank test). About 31.67% of individual cases (38 of 120 individual cases; permutation test, p < 0.01) showed a significant change, with the majority being in the predicted direction. In summary, these results demonstrate that across all complex-flow motion axes, distinct subgroups of MSTd units exhibit significant changes in their responses consistent with the Pinna–Brelstaff illusory effects.
Population summary of changes in the polar tuning functions during presentation of Pinna–Brelstaff figures. A, Distribution of Δ preferred angle for rotary motion sensitive MSTd units, arrow shows the population mean (black bars, cases that were significantly different with a permutation test at p < 0.01; white bars, remaining cases). B, Distribution of Δ preferred angle for radial motion-sensitive MSTd units; conventions as in A. C, Distribution of Δ amplitude for spiral motion-sensitive MSTd units; conventions as in A.
Neural responses of area MT to Pinna–Brelstaff figures
The neural origin of a visual illusion is often elusive (Spillmann and Werner, 1990). Area MT is an essential motion center in the primate dorsal visual pathway for processing global motion signals (Movshon et al., 1983; Rust et al., 2006). However, it has previously been shown to be insensitive to complex-flow motions (Tanaka et al., 1986; Smith et al., 2006), but because area MSTd receives major inputs from MT (Maunsell and Van Essen, 1983; Ungerleider and Desimone, 1986), it is still important to ascertain how MT itself responds during the perception of illusory complex-flow motions.
To address this, we examined how MT neurons respond to the global stimuli during presentation of the same Pinna–Brelstaff figures. We recorded from 120 MT units, about half of these MT units exhibited significant responses to stimuli during the presentation of the 0° condition (N = 64 of 120). Among these responsive MT units, about a third of the cases (35.94%, 46 of 128 cases; permutation test, p < 0.01; similar to MSTd, each MT neuron also has two cases, so the case number is twice the number of MT neurons) showed significantly increased responses to the ±45° conditions when compared with the 0° control condition (Fig. 8A). The population average response was enhanced by 25.59% (Z = 5.21, p = 1.93 × 10−7; cases, N = 128, Wilcoxon signed-rank test). About one-third of the cases (32.81%, 37 of 128 cases; permutation test, p < 0.01) showed significantly reduced responses (Fig. 8B), with an overall reduction of 23.61% (Z = −5.71, p = 1.11 × 10−8; cases, N = 128, Wilcoxon signed-rank test). Since the origin for both radial and rotary motion are kept at the center of the screen without considering the RF location of MT neurons and their relatively small RF sizes, it is expected that MT neurons respond to local real or illusory motion vectors of the embedded Gabor patches within the Pinna–Brelstaff figures. Despite this difference, MT response properties can still be measured using the same polar plot methodology as used for MSTd. However, for MT neurons the axes represent the local direction of motion of the Gabor elements rather than expansion, contraction, and CW and CCW rotations. We combined the tuned responses distributed on the cardinal axes (Fig. 8C, Δ preferred angle equivalent to radial and rotary motion in MSTd cells), separating them from tuned responses distributed between the cardinal axes (Fig. 8D, Δ amplitude equivalent to spiral motion). The tuning properties indicate that MT neurons can also represent illusory motions, even if they are local.
Results of single-unit recordings of MT in response to the physical manipulation of Pinna–Brelstaff figures. A, B, Log axis scatter plots of distributions of facilitative and suppressive responses across all Pinna–Brelstaff stimuli; cases were determined following the method used for MSTd neurons. C, Population results defining the percentage of a subgroup of MT units with preferred angles distributed around cardinal polar axis (black bars, cases that were significantly different with a permutation test at p < 0.01; white bars, remaining cases). The arrow shows a population mean of Δ preferred angle for both ±45° stimulus cases of 11.14 ± 2.24° (mean ± SEM; Z = 5.98, p = 2.18 × 10−9; cases, N = 128, Wilcoxon signed-rank test) with 41.41% of cases (53 of 128 cases); permutation test, p < 0.01) showing significant response changes. Cases were determined following the method used for MSTd neurons. D, Population results of Δ amplitude for another group of MT units with preferred angles distributed between cardinal polar axis; conventions are as in C. Mean amplitude change: 4.03 ± 1.31 spikes/s (mean ± SEM; Z = 2.96; p = 0.003; cases, N = 112, Wilcoxon signed-rank test) with 28.57% of cases (32 of 112 cases; permutation test, p < 0.01) showing significant response changes.
To further examine the presupposition that MT neurons respond to the local illusory motions, we performed a masking experiment for subgroups of MT neurons exhibiting significant responses to the global manipulation of the Pinna–Brelstaff figures. In the masked condition, the Gabor stimuli were presented only within the receptive field of the recorded MT neuron (Fig. 9A). This masking operation effectively removed both the global illusory and real complex-flow motions in the Pinna–Brelstaff figures. Nevertheless, the tuning curves of the MT neuron in response to the +45°, 0°, and −45° conditions were almost identical for the unmasked stimuli (Fig. 9B, left) and the masked stimuli (Fig. 9B, right). For the population, we also found that there was no significant difference between masked and unmasked stimuli (Δ preferred angle: Z = 0.8, p = 0.4209, N = 19; Δ amplitude: Z = 0.32, p = 0.5417, N = 13; Wilcoxon signed-rank test; Fig. 9C). To explain the contribution of local motion components in Pinna–Brelstaff figures to the tuning in MT, we can take the example neuron in Figure 9B and plot the tuning to both random dot complex-flow (Fig. 9D, up) and translational (Fig. 9D, down) motion. The preferred tuning direction for translational motion is consistent with the local direction of the complex-flow motion in the receptive field. We can see that the global motion stimuli with CW rotations will generate a local motion component of left–downward motion inside the receptive field of this unit (Fig. 9E, black arrow). But because of the aperture effect, the direction encoded by the MT neuron is biased to the direction perpendicular to the orientation of the gratings (Fig. 9E, white arrow) compared with the unbiased motion direction. The biased direction (left) is closer to the preference of the MT neuron (left–upward motion; Fig. 9D, down), and, as a result, the neuron shows a stronger response. This causes a higher firing rate observed for the +45° condition that is not seen under the 0° control condition (no aperture effect). Consistent with this, the whole tuning curve under the +45° condition is shifted downward, as indicated by the difference in the resultant vectors in the polar plot in Figure 9B. These results demonstrate that MT neurons predominantly encode the local illusory motion signals (presumed to be driven by the aperture effect) embedded in the micropatterns of the Pinna–Brelstaff figures.
The results of masking experiments on MT neurons. A, Unmasked full-field and locally masked Pinna–Brelstaff stimuli used. The red dashed circle in the right panel marks the location and size of the receptive field of an MT neuron. B, Polar plot of the neural responses to unmasked and masked Pinna–Brelstaff stimuli (error bars show the mean ± SEM), showing almost identical response tuning curves for the two conditions. The icons of black arrows around the polar axis present the local linear motion vectors inside the receptive field. C, Left, Scatter plot of Δ preferred angles across MT neurons specifically tested for unmasked and masked conditions. Box plot alongside the scatter plot showing the statistical distribution of Δ preferred angles between unmasked and masked conditions; there were no statistical differences for both Δ parameters between the two conditions. Right, Scatter plot and box plot of Δ amplitudes, using the same conventions as those for Δ preferred angles. D, Up, The response tuning curve of the exemplar neuron to random dot complex-flow motion, giving the strongest response to left–upward local linear motion; under this direction, the global complex-flow motion pattern is contraction. Down, The response tuning curve of the same neuron to translational random dot motion, exhibiting a left–upward (∼150°) preference. E, Illustrated derivation of biased motion direction.
Comparison of MT and MSTd neuronal responses to Pinna–Brelstaff figures
The above results demonstrate that MSTd neurons are able to globally encode illusory complex-flow motion, while MT neurons represent local biased motion. There are known differences in motion preferences and spatial scales of the receptive fields between MT and MSTd, so we next examined whether the sensitivity and selectivity for global illusory motion between these two cortical areas are different or not. For these analyses, we combined the different complex-flow motions together so that we could directly compare real against illusory motion.
First, we directly compare the Δ firing rate differences between MT and MSTd neuronal responses encoding illusory motion. The Δ firing rate was calculated by subtracting the firing rate of 0° control condition from the firing rate of ±45° conditions (only facilitative response cases were compared). In Figure 10A, we found that the Δ firing rates of the MSTd neurons (10.68 ± 0.79 spikes/s, mean ± SEM; cases, N = 264) are significantly larger than those of the MT neurons in response to the same unmasked stimuli (5.23 ± 0.92 spikes/s, mean ± SEM; cases, N = 128; Z = 4.53, p = 5.79 × 10−6, Wilcoxon rank sum test). As a control, we also compared the Δ firing rate (calculated by subtracting the firing rate of blank stimulus condition from the firing rate of 0° control condition) for real motion responses between the two areas and found that they are comparable (Fig. 10B; MSTd: 43.80 ± 2.59 spikes/s, mean ± SEM; N = 132; MT: 44.69 ± 3.61 spikes/s, mean ± SEM; N = 64; Z = −0.31, p = 0.7565, Wilcoxon rank sum test).
Comparison of both illusory and real motion sensitivity between MSTd and MT. A, Population distributions of Δ firing rate for illusory motion. B, Population distributions of Δ firing rate for real motion. C, The population distributions of AUC for both illusory (orange) and real (black) motion. Asterisks denote statistical significance: ***p < 0.001.
We used ROC analysis (Green and Swets, 1966; Britten et al., 1992, 1996; Celebrini and Newsome, 1994; Price and Born, 2010) to estimate how well an ideal observer (here the neuron) could discriminate real and illusory motions based on the neural activity in both brain areas. Specifically, for MSTd neurons the discriminability refers to the global real or illusory complex-flow motions, and for MT neurons it refers to local linear motion directions that are present within the MT receptive field as a part of the global motion pattern. The AUC quantifies the sensitivity of a neuron to discriminate illusory motion against the 0° control. Individual AUC values were calculated for each illusory motion response case in MSTd and MT, and population distributions are illustrated by the orange boxplots in Figure 10C. The mean AUC value for MSTd neurons (0.79 ± 0.01, mean ± SEM; cases, N = 264) is significantly larger than that for MT neurons (0.65 ± 0.03, mean ± SEM; cases, N = 128; Z = 3.52, p = 4.31 × 10−4, Wilcoxon rank sum test), demonstrating that MSTd has higher sensitivity in discriminating Pinna–Brelstaff illusory motion. A similar comparison was also performed for real motion, and the results are not significantly different between the two areas (MSTd: 0.89 ± 0.02, mean ± SEM; N = 132; MT: 0.93 ± 0.01, mean ± SEM; N = 64; Z = −1.13, p = 0.2571, Wilcoxon rank sum test). Hence, these results indicate that although both MT and MSTd respond to the Pinna–Brelstaff figures, MSTd exhibits relatively higher sensitivity compared with MT.
Integration time window for global illusory complex–flow motions
As demonstrated previously and in the current study, MT neurons have relatively small RFs compared with MSTd neurons and thus are well suited for processing local motion information within their RFs. The outcome of such local motion processing in MT is subsequently pooled by downstream areas such as MSTd to generate complex-flow motion (Layton and Fajen, 2016; Yu et al., 2018). Processing of visual information along a hierarchy of visual areas results in a distribution of temporal responses (Schmolesky et al., 1998). We therefore performed temporal ROC and peak firing rate analysis for both MSTd and MT units to address whether the integration of motion signals that generate illusory motion is similar to or different from real motion.
We first applied temporal sliding-window ROC analysis and calculated the AUC value across the stimulus duration for both real and illusory motion. We measured the population latency as the first significant AUC response above baseline (see Materials and Methods). In MT, the population latencies for real (Fig. 11A) and illusory (Fig. 11B) motion were 54 and 51 ms, respectively. In addition, we also computed the latency difference at the individual neuron level, and found no significant difference (Fig. 11E, right, orange and black boxes; real motion: 66.97 ± 2.44 ms, mean ± SEM; cases, N = 74; illusory motion: 78.73 ± 4.39 ms, mean ± SEM; cases, N = 44; Z = 1.76; p = 0.0783, Wilcoxon rank sum test). In MSTd, the population latency for real motion is 70 ms (Fig. 11C), shorter than that for illusory motion (85 ms; Fig. 11D). Again at the individual neuron level, the difference is significantly different between these two global motions (Fig. 11E, left, orange and black boxes; real motion: 84.25 ± 1.76 ms, mean ± SEM; cases, N = 110; illusory motion: 107.46 ± 3.98 ms, mean ± SEM; cases, N = 124; Z = 4.75; p = 2.06 × 10−6, Wilcoxon rank sum test). We further compared the latency difference at the individual neuron level between MT and MST for either real or illusory motion. We found that MSTd neurons take a significantly longer time than MT to reliably discriminate both real and illusory motion (Fig. 11E; real motion: Z = 6.31, p = 2.86 × 10−10; illusory motion: Z = 3.92, p = 8.96 × 10−5). The temporal ROC result demonstrates that both real and illusory complex-flow motions require a time window to integrate MT local motion signals, allowing MSTd to reliably discriminate global flow-motion patterns. The integration time that enables MSTd to reliably signal global flow motion is ∼16 ms for real motion and ∼34 ms for illusory motion, respectively, when compared with MT. Since the ROC latency in MT for both motion classes are almost the same, an extra 15 ms of integration time is required for illusory motion to be reliably represented in MSTd. The fact that there is no temporal difference between illusory and real motion in MT suggests that the integration mechanisms from its V1 inputs may be similar. We also compared the ROC latencies of MT neurons responding to masked and unmasked stimuli, and we found comparable latency distributions between those two conditions (masked condition: 89.07 ± 7.73 ms, mean ± SEM; cases, N = 15; unmasked condition: 78 ± 9.54 ms, mean ± SEM; cases, N = 11; Z = −1.71; p = 0.0866, Wilcoxon rank sum test). These results further suggest that illusory motion signals are propagated from MT to MSTd.
Temporal ROC analysis of AUC for both MSTd and MT responses. A, Temporal ROC analysis of MT neurons to real motion. Averaged black curve (± SEM) shows the change of AUC value throughout the first 200 ms after stimulus onset. The first time point where the AUC value is significantly higher than the 0.5 threshold (p < 0.01, after Bonferroni correction) is marked by a dashed line and a black arrow on the x-axis. B, Temporal ROC analysis of MSTd neurons to real motion. Same conventions as in A. C, Temporal ROC analysis of MT neurons to illusory motion. Same conventions as in A. D, Temporal ROC analysis of MSTd neurons and illusory motion. Same conventions as in A. E, Population distributions of latencies from MT and MSTd representing both illusory (orange) and real (black) motions. F, Population distributions of latencies from MSTd representing both illusory (orange) and real (black) rotary/radial motion. Asterisks denote statistical significance: **p < 0.01, ***p < 0.001.
Considering the differences between rotary and radial motion across visual space, we further compared MSTd latencies to both motion patterns. Interestingly, we found that MSTd neurons respond much faster to real rotary motion than to real radial motion (Fig. 11F, black boxes; real rotary motion: 76.76 ± 3.65 ms, mean ± SEM; cases, N = 38; real radial motion: 86.12 ± 1.95 ms, mean ± SEM; cases, N = 72; Z = −2.91; p = 0.0037, Wilcoxon rank sum test). However, such a difference was not found for illusory flow motions (Fig. 11F, orange boxes; illusory rotary motion: 100.83 ± 7.57 ms, mean ± SEM; cases, N = 44; illusory radial motion: 109.98 ± 4.69 ms, mean ± SEM; cases, N = 80; Z = −0.72; p = 0.4716, Wilcoxon rank sum test).
Figure 12 presents the population-averaged response curves of a temporal analysis for MSTd and MT neurons using onset and peak firing rate. The response onset latency for MSTd illusory motion (∼71 ms) was substantially longer than that for MT (52 ms), and so is the peak latency (MSTd, 92 ms; MT, 66 ms; Fig. 12A,B, left column); these differences were consistent with both onset (MSTd, 64 ms; MT, 53 ms) and peak (MSTd, 94 ms; MT, 67 ms) latencies of real motion (Fig. 12A,B, right column). Similar to the ROC temporal analysis, we found that in MSTd, illusory motion needs more time for its representation than real motion (Fig. 12A; illusory motion, 71 ms; real motion, 64 ms), while such differences did not exist in MT (Fig. 12B, illusory motion, 52 ms; real motion, 53 ms). We also measured the onset and peak latencies at the individual neuron level, and the averaged onset latency for illusory motion of MSTd neurons was significantly longer than that of MT (Fig. 12C, left, orange boxes; MT, 58.64 ± 4.18 ms, mean ± SEM; cases, N = 57; MSTd: 69.27 ± 2.27 ms, mean ± SEM; cases, N = 144; Z = 4.19; p = 2.85 × 10−5, Wilcoxon rank sum test), and so is the peak latency (Fig. 12D, orange boxes; MT: 75.47 ± 4.24 ms, mean ± SEM; cases, N = 57; MSTd: 91.65 ± 3.43 ms, mean ± SEM; cases, N = 144; Z = 3.78; p = 1.56 × 10−4, Wilcoxon rank sum test). These latency differences suggest that MSTd neurons require a time window to globally integrate feedforward inputs of MT local illusory motion. For real motion, as expected, an integration time window is also needed (Fig. 12A,B, right column) when comparing MT and MSTd responses (Fig. 12C,D, black boxes; see statistical summary in the figure legend). We also compared the responses of MSTd neurons between illusory and real motion conditions, and responses of MT neurons between the same motion conditions. We found no statistical difference for the peak latency (Fig. 12D, orange vs black boxes; MSTd: Z = 1.13; p = 0.2573; MT: Z = −0.1; p = 0.9207, Wilcoxon rank sum test). However, for the response onset latency, similar to the temporal ROC statistical analysis, we found that MSTd neurons respond significantly later for illusory than for real motion, while such a difference did not exist in MT (Fig. 12C, orange vs black boxes; MSTd: Z = 3.42; p = 6.32 × 10−4; MT: Z = 0.95; p = 0.3398, Wilcoxon rank sum test).
Temporal peak firing rate analysis of MSTd and MT neural responses to both real and illusory complex-flow motions. A, Average PSTH responses (boxcar smoothed, ±SEM) of MSTd subgroups representing illusory and real radial motion. Black arrows on the x-axis mark the time at which the response begins and peaks. Left, Result from ±45° Pinna–Brelstaff figures (illusory motion). Right, Result from 0° control Pinna–Brelstaff figure (real motion). B, Average temporal responses of MT neurons to the same stimuli used for MSTd. Same convention as in A. C, Population data for the response onset latencies of individual MT and MSTd neurons to illusory (orange boxes) or real (black boxes) motion (real MSTd: 60.79 ± 3.35 ms, mean ± SEM; cases, N = 117; real MT: 54.07 ± 2.95 ms, mean ± SEM; cases, N = 85; Z = 2.85; p = 0.0044, Wilcoxon rank sum test). D, Population data for the peak latency (real MSTd: 85.32 ± 4.03 ms, mean ± SEM; cases, N = 117; real MT: 72.32 ± 3.06 ms, mean ± SEM; cases, N = 85; Z = 2.7; p = 0.0069, Wilcoxon rank sum test); same conventions as in C. Asterisks denote statistical significance: **p < 0.01, ***p < 0.001.
From the above results, we conclude that a bottom-up integrative mechanism underlies both real and illusory complex-flow motions encoded by MT and MSTd, but the processing of illusory local motion signals requires additional time to reliably represent global illusory motion (summarized in Fig. 13 as an example for illusory rotation). It would be reasonable to expect that MST takes longer to extract a global illusory motion induced by other classes of real flow motion (e.g., illusory rotation, which is induced by real expansion), when compared directly with the same class of real flow motion (e.g., real rotation). Note that we cannot exclude the additional possibility that feedback from higher cortical areas may also contribute to the global integration of complex-flow motion in MSTd (Fig. 13, feedforward accumulation, intracortical computation, and recurrent feedback are illustrated as potential neural mechanisms). These possibilities have been suggested in our recent fMRI study on dynamic network communication accounting for the individual variation of human perception of the same illusory complex-flow motion (Wang et al., 2018). We also cannot exclude the possibility that synchronized population firing of spatially aligned MT neurons could represent both real and illusory complex-flow motions with no latency delays (Fig. 13, middle, MT). However, this requires multiple single-unit recordings simultaneously obtained from these spatially aligned MT neurons and is beyond the scope of the current study to address.
Schematic summary of motion integration from local to global illusory complex-flow motion. Diagram illustrates how an example MSTd CCW rotation-sensitive neuron with a large RF (top) integrates MT local motion inputs (middle) to represent global illusory CCW rotary motion from a physically expanding +45° Pinna–Brelstaff stimulus (bottom). The AUC temporal dynamics of MSTd and MT populations plotted within the MSTd RF (top circle) show the integration time window for illusory motion. The black circles represent MT and MSTd neuron RFs. Small inset on the right of the middle panel illustrates the orthogonal motion signal of MT neurons due to the aperture effect (middle, short orange arrows); such biased illusory motion signals from a group of circularly arranged MT neurons are assumed to be integrated (shown by orange arrows between middle and top panels) by the MSTd neuron, resulting in global illusory CCW rotation (top, long orange arrows and circle).
Discussion
The study of the mismatch between perception and reality helps us to gain a deeper insight into the neural mechanisms of visual perception (Wertheimer, 1912; Gregory, 1972; von der Heydt et al., 1984; Spillmann and Werner, 1990; Eagleman, 2001; Komatsu, 2006; Murray and Herrmann, 2013). Over the past decade or so, physiological explorations of the brain mechanisms underlying illusory perception have relied mostly on fMRI studies in human subjects (Murray and Herrmann, 2013). Correlates of nonmotion illusions resulting in nonveridical perception of contours, angles, and surfaces have been predominantly found in primate ventral visual cortices V1, V2, and V4 (Grosof et al., 1993; Mendola et al., 1999; Lee and Nguyen, 2001; Murray et al., 2002; Stanley and Rubin, 2003; Meng et al., 2005; Montaser-Kouhsari et al., 2007; Fang et al., 2008; Schwarzkopf et al., 2011; Pan et al., 2012; Sperandio et al., 2012; Cox et al., 2013). By comparison, motion illusions such as the waterfall illusion (Tootell et al., 1995), the “rotating snakes” illusion (Ashida et al., 2012), and the flash-drag effect (Maus et al., 2013) have been attributed to the hMT+. Our recent human fMRI studies of the Pinna–Brelstaff illusion show that illusory rotation is predominantly associated with the activation of the subarea MST in hMT+ (Pan et al., 2016; Wang et al., 2018). However, the detailed neural mechanisms underlying the illusory rotation as well as illusory expansion, contraction, and spiral motion remain unknown.
To address this question, we first performed psychophysical experiments on both human and nonhuman primates in response to the same physical manipulations of the same Pinna–Brelstaff figures (Fig. 1C,D). With a nulling procedure, we found shifts in the psychometric functions of both monkeys comparable to those of human subjects, from which we infer that these monkeys perceive illusory complex-flow motions. Subsequent electrophysiological recordings in both MT and MSTd of the same two monkeys, together with masking experiments and ROC temporal analysis revealed that the local nonveridical motion signals encoded in MT neurons are globally integrated by downstream MSTd neurons to generate various types of complex-flow motion illusions (Fig. 13, example of illusory CCW rotation). Compared with real motion, reliable representation was temporally delayed for illusory motion.
Neural mechanisms underlying the Pinna–Brelstaff illusion
Complex motion illusions, such as the rotating snakes illusion (Kitaoka and Ashida, 2003) and the rotating tilted lines illusion (Gori and Hamburger, 2006), are also largely affected by the arrangement of the local micropatterns within the global static figures. However, the rotating snakes illusion critically relies on subjects making saccades toward the peripheral static ring patterns (Otero-Millan et al., 2012; Kitaoka, 2014) as well as the luminance relationship of the static elements (Conway et al., 2005). The strength of the rotating tilted lines illusion is generally weaker when compared with the Pinna–Brelstaff illusory rotation. And both of these illusions are less easy to experimentally parametrize. Thus, illusory complex-flow motion patterns elicited by the physical manipulation of the Pinna–Brelstaff figures (Fig. 1B) remain the best example for studying the integration of local-motion information into global representation in the primate dorsal visual stream.
Previous studies have suggested that an aperture effect (Marr and Ullman, 1981) may represent the Pinna–Brelstraff illusory rotation (Bayerl and Neumann, 2002; Gurnsey et al., 2002; Gurnsey and Pagé, 2006). The aperture effect is a well known phenomenon, from which, for a bar moving obliquely with respect to its orientation, the perceived motion direction through a circular window (or receptive field) is perpendicular to the orientation of the bar. The Pinna–Brelstaff figure is composed of a group of circularly arranged micropatterns, each of which is tilted relative to the radial axis of the global ring. When these micropatterns move radially (expansion/contraction) or circularly (rotation), the perceived motion direction of each tilted element will be biased away from the radial or circular paths due to the aperture effect (Fig. 9E). Consequently, the integration of these locally biased motion vectors would form a global perception of radial or rotary motion that is not veridical. It has been shown that V1 end-stopped cells and MT neurons can “solve” the aperture problem as long as short bars with well defined ends are presented inside their receptive fields (Pack and Born, 2001; Pack et al., 2003, 2004). However, the Gabor patches constituting the Pinna–Brelstaff figures in our study had fuzzy ends (Fig. 1C) and therefore would have reduced the contribution from end-stopped V1 cells in resolving the aperture problem. Future work should attempt to extend the methods used to study the linear/nonlinear integration mechanisms that combine V1 local inputs to MT (Livingstone et al., 2001; Pack et al., 2006; Rust et al., 2006; Richert et al., 2013) toward better understanding of MSTd complex-flow motion responses.
Although the aperture effect offers a plausible mechanism in generating the Pinna–Brelstaff illusion, there is room for alternative explanations. For example, a simple model applying an aperture mechanism failed to fully reproduce the strong illusory motion effects observed in human psychophysics (Bayerl and Neumann, 2002). Future work needs to explore a broader range of the stimulus parameters giving rise to the Pinna–Brelstaff illusion and examine how they influence the neural activity in both MT and MSTd. In particular, it will be important for future studies to quantify more precisely the correlations between psychometric and neurometric responses (Parker and Newsome, 1998) during the perception of illusory complex-flow motion. In addition, some MT neurons have an antagonistic surround (Allman et al., 1985; Tanaka et al., 1986; Bradley et al., 1998; DeAngelis and Uka, 2003; Born and Bradley, 2005) and often exhibit strong surround suppression (Born and Tootell, 1992; Born, 2000). It would be interesting to know how strong surround suppression may contribute to this illusory motion perception (Cui et al., 2013; Krause and Pack, 2014).
The integration time window between MT and MSTd
Using moving gratings and plaid stimuli (i.e., component vs pattern motion), previous neurophysiological studies have focused on motion integration from V1 to MT in macaques (Adelson and Movshon, 1982; Movshon et al., 1983; Stoner and Albright, 1992; Rust et al., 2006; Majaj et al., 2007). These studies demonstrate that most V1 neurons carry local motion signals, whereas subgroups of MT neurons tend to represent global motion patterns within their receptive fields (Born and Bradley, 2005; Nassi and Callaway, 2009). Temporal analysis has identified both the switch time from component to pattern motion responses within MT (Pack et al., 2001; Smith et al., 2005) and has defined the integration time between components using pseudoplaids (Kumbhani et al., 2015). For complex-flow motion patterns like rotational, radial (expansion/contraction), and spiral motions, previous studies suggest that MSTd neurons may combine MT inputs (Saito et al., 1986; Tanaka et al., 1989; Graziano et al., 1994; Beardsley et al., 2003) via a nonlinear integration mechanism that approximates a multiplicative interaction within subfields of MSTd large receptive fields (Duffy and Wurtz, 1991; Yu et al., 2010; Mineault et al., 2012). Although there are a number of studies that measure response latency for MT or MST individually (Raiguel et al., 1989, 1999; Osborne et al., 2004), two studies that we are aware of have measured response latency in MT and MST simultaneously with simple motion stimuli (Schmolesky et al., 1998; Azzopardi et al., 2003), and only one study incidentally reported values using complex-flow motion (Lagae et al., 1994, their Methods section, p. 1601). Thus, the time window for this hierarchical integration process between MT and MSTd remained elusive. In this study, the neural responses and signal reliability of both MT and MSTd to physical manipulation of the same Pinna–Brelstaff figures across various conditions were directly examined. For peak response latency, the time window for the integration from MT local motion inputs to form global flow-motion patterns in MSTd was found to be ∼27 ms. Critically, a temporal ROC analysis identified that an extra time lag of ∼15 ms is needed for MSTd neurons to integrate local nonveridical motion and to reliably discriminate global illusory motion when compared with real motion. As intracortical processing is critical for neural computation (Douglas and Martin, 2004), future laminar analysis of MT and MSTd could help to clarify the exact mechanisms that underlie the extra time lag for the representation of global complex-flow motion illusion.
Concluding remarks
Our results demonstrate that the representation of both real and illusory complex-flow motions in MSTd relies on an integration of bottom-up MT inputs, yet reliable discrimination is temporally delayed for the illusion. A series of studies has reported that several brain areas downstream to MSTd are also sensitive to real rotary or radial complex-flow motions, including areas 7a (Sakata et al., 1994; Siegel and Read, 1997), superior temporal polysensory area (Anderson and Siegel, 2005), and ventral intraparietal area (Schaafsma et al., 1997). Similar to MSTd, neurons in these areas also have large RFs. It is likely that these higher brain areas also contribute to the perception of complex-flow motion patterns. The question remains as whether these higher brain areas in the primate dorsal visual stream distinguish between real and illusory motions during active perception.
Footnotes
This work was supported by the Strategic Priority Research Program of Chinese Academy of Sciences, Grant No. XDB32060200; the Shanghai Municipal Science and Technology Major Project, Grant No. 2018SHZDZX05 to W.W.; the National Natural Science Foundation of China Grant No. 31571078 and No. 31861143032 to W.W., Grant No. 31761133014 to Y.G., and Chinese Academy of Sciences Research Project Grant No. GJHZ1735 to I.M.A.
We thank Drs. Lothar Spillmann, Niall McLoughlin, Stewart Shipp, and Muming Poo for valuable comments and suggestions on the manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to Yong Gu at guyong{at}ion.ac.cn or Wei Wang at w.wang{at}ion.ac.cn