Abstract
During bistable vision, perception oscillates between two mutually exclusive percepts despite constant sensory input. Greater BOLD responses in frontoparietal cortex have been shown to be associated with endogenous perceptual transitions compared with “replay” transitions designed to closely match bistability in both perceptual quality and timing. It has remained controversial, however, whether this enhanced activity reflects causal influences of these regions on processing at the sensory level or, alternatively, an effect of stimulus differences that result in, for example, longer durations of perceptual transitions in bistable perception compared with replay conditions. Using a rotating Lissajous figure in an fMRI experiment on 15 human participants, we controlled for potential confounds of differences in transition duration and confirmed previous findings of greater activity in frontoparietal areas for transitions during bistable perception. In addition, we applied dynamic causal modeling to identify the neural model that best explains the observed BOLD signals in terms of effective connectivity. We found that enhanced activity for perceptual transitions is associated with a modulation of top-down connectivity from frontal to visual cortex, thus arguing for a crucial role of frontoparietal cortex in perceptual transitions during bistable perception.
Introduction
During bistable vision, perception oscillates between two mutually exclusive percepts despite constant sensory input (Blake and Logothetis, 2002). Perceptual bistability is evoked by a variety of stimuli, such as ambiguous figures like the Necker cube, or binocular rivalry, where different images are presented to each eye. Previous fMRI studies that aimed to identify the neuronal mechanisms underlying bistable perception produced conflicting results and interpretations. Lumer et al. (1998) introduced a nonrivalrous “replay” condition that mimics the participants' perception during binocular rivalry in both perceptual quality and timing and thus creates a matched sequence of perceptual alternations. Greater BOLD responses in frontoparietal cortex were associated with endogenous perceptual transitions, compared with stimulus-induced changes during replay. In keeping with the notion that higher-order areas may trigger perceptual transitions by actively selecting one of the two possible interpretations (Leopold and Logothetis, 1999), it was proposed that neural activity in frontoparietal regions plays a causal role for perceptual transitions. Alternatively, enhanced frontoparietal activity could reflect an effect of stimulus differences that result in, for example, longer durations of perceptual transitions in ambiguous perception compared with replay conditions. Recently, Knapen et al. (2011) followed up on this idea by taking the variability of transition durations into consideration. Comparing BOLD activity related to long transitions with activity related to short transitions, they found the same frontoparietal network to be active as when all perceptual transitions were compared with baseline activity. They concluded that activity in frontoparietal cortex is not the trigger for but rather occurs in response to perceptual transitions.
Here, we sought to probe the effective connectivity among brain regions related to perceptual transitions with Bayesian model comparison in the context of dynamic causal modeling (DCM) for fMRI (Friston et al., 2003, 2013). Using a rotating Lissajous figure, which induces perception of a 3D object spontaneously and instantaneously changing its direction of rotation (Weber, 1930), we controlled for potential confounds of differences in transition duration. Furthermore, perceptual transitions occur most frequently at critical configurations of the Lissajous stimulus, thus making it well suited for the analysis of transition-related brain activity. Based on previous studies suggesting a causal role of frontal (Sterzer and Kleinschmidt, 2007) and parietal regions (Kanai et al., 2011) in perceptual transitions, we hypothesized that enhanced BOLD activity for ambiguous transitions is better explained by top-down compared with bottom-up models of spontaneous changes in bistable perception.
Materials and Methods
Participants.
Twenty right-handed observers participated in this study, which was conducted with local ethics approval at the Berlin Center for Advanced Neuroimaging, Charité Universitätsmedizin Berlin, Germany. Five participants were excluded because of random perceptual transitions during replay in the fMRI experiment, most likely caused by incomplete binocular fusion due to the divider setup used in the fMRI scanner. All remaining 15 participants (eight female, mean age 26.8 years, range 22–33 years) had normal or corrected-to-normal vision, were naive to the purpose of the study, and provided informed written consent.
Stimulus and procedure.
Stimuli were generated with Psychophysics Toolbox 3 (Brainard, 1997) and projected by a Sanyo LCD projector at 60 Hz. There were two block types: ambiguous and replay. In ambiguous blocks, two identical moving Lissajous figures (size 2.05°), formed by the intersection of two sinusoids with perpendicular axes (x(t) = sin(3t); y(t) = sin(6t + ∂); with ∂ increasing from 0° to 360°), were presented separately to the two eyes (Fig. 1A, top). For dichoptic stimulation, a cardboard divider was placed between mirror and screen at the end of the scanner's bore. Participants wore prism glasses to aid binocular fusion. Fixation marks were displayed at the center, and fusion frames surrounded the stimuli. Participants indicated the perceived direction of rotation of the Lissajous figure by pressing a left (clockwise, as seen from the top) or right (counter-clockwise) button with their right hand (Fig. 1B). They were instructed to respond to the first perceived direction after block onset and to all upcoming perceptual transitions, and to report unclear/mixed percepts by pressing the middle button. Replay blocks mimicked the sequence of transitions reported during the preceding ambiguous block with a disambiguated stimulus. To this aim, the two dichoptically presented Lissajous figures were slightly phase-shifted against each other (Fig. 1A, bottom). This offset of ±0.04° acted as disparity cue disambiguating the stimulus, so that the participants perceived the stimulus as rotating in the direction of the phase shift.
Experimental paradigm. A, In ambiguous blocks (top), two identical Lissajous stimuli were presented separately to two eyes, inducing the percept of an object spontaneously changing its direction of rotation. Replay blocks (bottom) were generated by changing disparity cues (gray arrows) at time points matched to the perceptual time course of the preceding ambiguous block. B, Participants reported whether they perceived the Lissajous figure as rotating clockwise (CW) or counter-clockwise (CCW), within a top-view reference frame. Vertical dashed lines denote perceptual transitions between CW and CCW rotation.
In a psychophysical pretest using a mirror stereoscope outside the fMRI scanner, we individually adjusted the rotational speed for every participant to one of three speed levels (0.12, 0.15, and 0.2 Hz) to obtain percept durations of ∼10 s (Philip and Fisichelli, 1945). Participants performed three pretest runs of four pairs of ambiguous blocks and disambiguated replays during which we recorded ratings of perceived transition speeds on a 4 point scale after each block: 1, instantaneous; 2, almost instantaneous; 3, quite instantaneous; 4, prolonged.
Participants completed three fMRI runs of eight pairs of ambiguous and replay blocks presented in an alternating sequence. Block duration was 42.8, 40.90, or 41 s, respectively, depending on rotational speed. Blocks were separated by 10 s fixation. After the experiment, participants answered a debriefing questionnaire concerning their perception (A: Did you have the impression that some blocks were different from others? B: Did you perceive the transitions as instantaneous or prolonged? C: Were you able to tell the direction of rotation of the Lissajous figure at all times during the experiment?). In a localizer fMRI experiment, participants viewed four blocks with moving and four blocks with static Lissajous figures in random order while performing a detection task at fixation. Block duration was 17.6, 14.1, or 15.8 s, respectively, and blocks were separated by 9 s fixation.
Behavioral data analysis.
All perceptual transitions were categorized as being either ambiguous or replay events, depending on whether they occurred during ambiguous or replay blocks. Ambiguous events were determined based on the participant's button presses indicating perceived transitions of rotation. During extensive psychophysical piloting, we found that transitions occur most frequently at overlaps (or, self-occlusions) of the Lissajous figure (i.e., at 0° and 180° of rotation), which is in line with previous work (Pastukhov et al., 2012). Therefore, we defined the onset of an ambiguous event as the last overlap preceding the button press. For each perceptual transition reported during an ambiguous block, a change in disparity was generated at the corresponding time point in the subsequent replay block by reversing the phase offset between the two Lissajous figures. If the participant responded to that disparity change, this resulted in a replay event whose onset was defined by the onset of that change.
The matching of events was determined based on the participant's responses during replay. If the participant responded to a disparity change within 2 s after its onset, this replay event and its ambiguous counterpart were defined as matched; otherwise, both events were labeled nonmatched. Likewise, if the participant indicated a perceptual transition in the replay condition without a preceding disparity change, this event was considered as nonmatched.
The onsets of nonmatched replay and nonmatched ambiguous events were defined by subtracting the average response time in the respective condition from the time point of the button press. Average response times were calculated individually for every participant and every run as mean difference between the time points of all button presses and the onsets of the preceding overlaps.
Finally, we determined relative transition probabilities depending on the rotation phase of the Lissajous figure, where 0° and 180° represent overlaps of the sinusoids composing the stimulus (Fig. 2A). Separately for the ambiguous and replay condition, transition onsets were calculated by subtracting the average response time during replay (defined as the latency between disparity change and button press), from the time points of the button presses. All values for this calculation were expressed in degrees of rotation of the Lissajous figure.
fMRI acquisition and preprocessing.
BOLD images were acquired by T2*-weighted gradient-echo echo-planar imaging (FOV 192, 33 slices, TR 2000 ms, TE 30 ms, flip angle 78°, voxel size 3 × 3 × 3 mm, interslice gap 10%) on a 3T MRI scanner (Tim Trio, Siemens). We recorded 402 (0.15 and 0.2) or 415 (0.12 Hz) volumes, respectively, for each of the three experimental runs, and 310 (0.12), 270 (0.15), or 290 (0.2 Hz) volumes, respectively, for the localizer run. Anatomical images were acquired using a T1-weighted MPRAGE sequence (FOV 256, 160 slices, TR 1900 ms, TE 2.52 ms, flip angle 9°, voxel size 1 × 1 × 1 mm). We used statistical parametric mapping (SPM8; http://www.fil.ion.ucl.ac.uk/spm/software/spm8/) for image preprocessing (standard realignment, coregistration, normalization to MNI stereotactic space using unified segmentation, spatial smoothing with 8 and 10 mm full-width at half-maximum isotropic Gaussian kernels for single-subject and group analyses, respectively). For effective connectivity analyses using DCM, images were slice-time corrected with reference to the first slice.
fMRI data analysis: general linear model (GLM).
Using a GLM approach (Friston et al., 1994) with a mixed event-related and block design, perceptual transitions were modeled as events using a stick function convolved with the canonical hemodynamic response function implemented in SPM8. Fixation screens were modeled as blocks using a boxcar function convolved with the canonical hemodynamic response function. For the main experiment, the model included five regressors: “Matched ambiguous transition,” “Matched replay transition,” “Nonmatched ambiguous transition,” “Nonmatched replay transition,” and “Fixation.” The localizer experiment was analyzed using three regressors: “Moving Lissajous,” “Static Lissajous,” and “Fixation.” In both experiments, the design matrix included the temporal derivatives of the canoncial hemodynamic response function and six rigid-body realignment parameters as nuisance covariates. After high-pass filtering at 1/128 Hz, we estimated single-subject statistical parametric maps, then created contrast images, and entered these into one-sample t tests at the group level. Anatomic labeling of cluster peaks was performed using the SPM Anatomy Toolbox Version 1.7b (Eickhoff et al., 2005).
For ROI analyses, voxels responding to perceptual transitions were identified as follows: We mapped the group-level contrast “All transitions events > baseline” at p < 0.0001, uncorrected, and defined search spheres of 10 mm radius around local cluster peaks with the highest t values in the right inferior frontal gyrus (IFG, [57, 17, 10], t(14) = 6.60), right inferior parietal lobule (IPL, [48, −34, 49], t(14) = 6.91), and right human motion complex (hMT+, [51, −67, 4], t(14) = 5.21). These regions were not the only clusters in this map (Fig. 3A) but were selected based on a priori knowledge from previous studies (e.g., Lumer et al., 1998; Sterzer and Kleinschmidt, 2007; Kanai et al., 2011). The coordinates of hMT+ were confirmed by the localizer map “Moving > Static” ([45, −67, 4], t(14) = 11.46). Left supplementary motor area (SMA, [−6, −1, 49], t(14) = 12.96) was selected as a control region. Individually for each participant, we then created four ROIs by selecting all voxels within the corresponding search spheres that passed a lenient threshold (p < 0.05, uncorrected) for the same contrast at the single-subject level. Parameter estimates were calculated using the MarsBaR Toolbox 0.42 (http://marsbar.sourceforge.net/), averaged across all ROI voxels, and submitted to two-sided paired t tests. p values were Bonferroni-corrected for the number of ROIs.
fMRI data analysis: DCM.
We adopted a DCM approach to study the connectivity structure of the extracted regions and its modulation by perceptual events (Friston et al., 2003). Specifically, we aimed to characterize ambiguous events as driving or modulatory inputs to a network consisting of right hMT+ and right IFG (Sterzer and Kleinschmidt, 2007). For these regions, we extracted eigenvariate time courses from ROI voxels while adjusting for effects of interest (p < 0.001, uncorrected). These time courses are the first principal component of the local multivariate time series over all ROI voxels and represent a summary of their activity. All DCMs considered were bilinear, one-state-per-region models without stochastic effects or center input. For all models, we defined reciprocal intrinsic connections between right hMT+ and right IFG. The dynamics of this network were allowed to be affected by inputs modeled by the hierarchical regressors “Rotation” (R, block regressor for Lissajous rotation), “Perceptual transition” (T, all changes in perceived rotation indicated by the participant in ambiguous and replay blocks), and “Matched ambiguous transition” (MAT, all events contained in the corresponding GLM regressor). Based on previous findings, “Rotation” and “Perceptual transition” were considered as main driving inputs to hMT+ (Freeman et al., 2012).
Our DCM model space was 3D and consisted of 3 × 4 × 3 = 36 models. Along all dimensions, groups of models were compared using Bayesian model family comparison (Penny et al., 2010). Along the first dimension, we varied the target region for the driving input from regressor T. Models with input to right hMT+ showed the highest exceedance probability (54.45%), compared with models with input to right IFG (12.84%), or to both hMT+ and IFG (32.71%). The remaining model space dimensions, which are explored in more detail in Results, were constructed by systematically varying the input from the MAT regressor to test for ambiguity-related effects over and above the general transition-related activity modeled by the T regressor (Fig. 4A). Thus, the MAT regressor was used to assess the modulation of transition-related activity by ambiguity relative to the replay condition. Specifically, along the second model space dimension, it could act as additional driving input either to right hMT+ or to right IFG, to both, or to none of the two regions. Along the third dimension, this regressor could modulate the connection from right hMT+ to right IFG (bottom-up), the connection from right IFG to right hMT+ (top-down), or both (reciprocal). Within the winner family of the third dimension (related to our hypothesis), model evidence was compared via model exceedance probabilities using Bayesian model selection with a random effect analysis to account for interindividual differences (Stephan et al., 2009).
Results
Behavioral results
As predicted, perceptual transitions in the ambiguous condition most frequently occurred at 0° and 180° of rotation of the Lissajous figure (i.e., when there was an overlap of the stimulus; Fig. 2A). Of all overlaps, 42% were accompanied by transitions, and 90% of all button presses fell within the 2 s response window (median response latency, 0.75 s). Mean percept duration was 9.3 s (median, 7.1 s). Replay transitions occurred at virtually identical phases because they were temporally matched to ambiguous events by the change of disparity cues at corresponding time points. Mixed or otherwise unclear perception of the stimulus was very rare: 0.34% of all button presses in the ambiguous condition were “mixed” responses, and 0.10% in the replay condition. The small number of responses (5 total) precluded any further statistical analysis of “mixed” percept durations.
Behavioral results. A, fMRI experiment: relative transition probability depending on the phase of Lissajous rotation (0–360°; overlaps at 0° and 180°) in ambiguous and replay condition. The panels represent mean transition probabilities across all participants. The inner dashed circles of the polar plots indicate 5%; the outer circles indicate 10%. B, Psychophysical pretest: relative frequency of perceptual transition speed ratings (1, instantaneous; 2, almost instantaneous; 3, quite instantaneous; 4, prolonged) for ambiguous and replay blocks. Error bars indicate SEM.
Figure 2B summarizes the participants' ratings of transition speeds during the pretest. Across all speed ratings, we found no significant effect of condition (F(1,14) < 1), but transitions were slightly more frequently rated as “instantaneous” in the ambiguous than in the replay condition (t(14) = 3.70, p = 0.002).
No participant reported any differences between block types during debriefing. One participant remarked that some perceptual time courses appeared similar to the preceding block, yet no participant noticed that there were ambiguous and stimulus-induced (replay) events. All participants described the transitions as being instantaneous as opposed to prolonged. They reported that they were able to tell the direction of rotation of the Lissajous figure at all times.
fMRI results: GLM
We first mapped activity corresponding to changes in the perceived direction of rotation of the Lissajous stimulus, regardless of whether the perceptual transitions were of endogenous nature (ambiguous) or stimulus-induced (replay; Fig. 3A). Similar to previous reports (Lumer et al., 1998; Sterzer and Kleinschmidt, 2007; Knapen et al., 2011), the contrast “all perceptual transitions > baseline” revealed significant clusters in right middle frontal gyrus, right precentral gyrus, bilateral inferior frontal lobe, right hMT+, and right inferior parietal lobule, as well as clusters in left somatosensory cortex related to the motor responses. Transition-related activity was also present in left hMT+ (additional ROI analysis: t(14) = 3.45, p = 0.004) but did not survive map thresholding resulting from larger signal variability. Figure 3B shows that the contrast “matched ambiguous events > matched replay events” yielded clusters in bilateral middle and inferior frontal, right superior parietal areas, as well as in bilateral hMT+. Next, we tested whether our ROIs, which were based on the first contrast, responded more strongly to matched ambiguous compared with matched replay events. GLM parameter estimates were significantly larger for ambiguous than for replay events in right hMT+ (t(14) = 2.91, p = 0.046) and right IFG (t(14) = 3.30, p = 0.021), whereas a similar numerical difference in right inferior parietal lobule did not survive correction for multiple tests (t(14) = 2.21, p = 0.178; Fig. 3C). We tested for spatial specificity of this effect by analyzing parameter estimates in response-related left supplementary motor area, which were not significantly different (t(14) < 1). As a control for temporal specificity and possible block-related (rather than transition-related) effects, we randomly shuffled event onsets for every participant and every run among all Lissajous overlaps within a given block. If the difference observed for ambiguous versus replay events was the result of block-related effects, the event-related ambiguity effect should be immune to such temporal shuffling. This was not the case: the corresponding reestimated GLMs yielded no significant differences between the ambiguous and replay condition in any of the ROIs (all p > 0.940).
fMRI-GLM results. A, The contrast “all perceptual transitions > baseline,” thresholded for visualization purposes at p < 0.0001, uncorrected. B, The contrast “matched ambiguous events > matched replay events,” thresholded for visualization purposes at p < 0.005, uncorrected. Both contrasts are mapped onto the lateral aspects of an inflated cortical surface of a canonical average brain. Color bars indicate t values (df = 14). C, ROI analysis revealed significantly larger parameter estimates (in arbitrary units [a.u.]) for matched ambiguous events than for matched replay events in right hMT+ and right IFG. *p < 0.05 (corrected for multiple tests). Error bars indicate SEM.
fMRI results: DCM
We first applied Bayesian model family comparison to compare models with additional driving input from “Matched replay transitions” (MAT) to right hMT+, right IFG, both, or none. Figure 4B (left) shows that models without additional input were the models with the highest log-evidence (exceedance probability 96.58%). We then divided the DCM model space along its modulatory dimension and found that models characterized by top-down modulation showed the highest exceedance probability (99.43%; Fig. 4B, middle). Bayesian model selection identified model 11 as the winner (95.37%) within the top-down family (Fig. 4B, right). In addition to the basic structure common to all models, consisting of rotation and all perceptual events as driving inputs to hMT+, this specific model was characterized by matched ambiguous events acting as modulatory input on the top-down connection from IFG to hMT+. In line with our a priori hypothesis, we found this modulatory effect to be on average significantly greater than zero (0.06 ± 0.05, mean ± SEM; z = −1.76, p = 0.039, one-tailed), using Wilcoxon's signed rank test due to significant deviation from normal distribution (Shapiro–Wilk: p < 0.05). As a control analysis, we constructed the same 3D DCM model space again but used the “Matched replay transition” (MRT) regressor in place of the MAT regressor and tested which family of models (MRT vs MAT) better described our fMRI data using Bayesian model comparison. Instead of probing the modulation by ambiguous events, this new family tested modulation by replay events. The result clearly indicated that MAT models are superior to MRT models (exceedance probability 71.38%).
fMRI-DCM results. A, DCM model space was constructed by systematically changing the influence of MATs on the network. Out of the 3D model space, only the winning model family of the first dimension (driving input from regressor T to target region MT) is shown to yield a 2D representation. B, Left, Exceedance probabilities for groups of models with additional driving input from MAT to region MT, IFG, both, or none. Middle, Exceedance probabilities for groups of models with bottom-up, top-down, or reciprocal modulation. Right, Exceedance probabilities of models within the top-down family. Essentially identical results were obtained when the MAT regressor included matched and nonmatched transitions. MAT, “Matched ambiguous transition” regressor; T, “perceptual transition” regressor; R, “rotation block” regressor; MT, human motion complex hMT+; IFG, inferior frontal gyrus.
Discussion
We investigated fMRI-BOLD activity associated with perceptual transitions of a bistable rotating Lissajous figure. Confirming earlier findings (e.g., Lumer et al., 1998; Zaretskaya et al., 2010), we show that spontaneous transitions during ambiguity evoke greater BOLD responses in frontoparietal cortex than stimulus-induced transitions during replay. Common interpretations of these activity differences rest on the assumption that these activations reflect differences in the nature of the involved events: whereas ambiguous events arise spontaneously because of some properties of the underlying neural circuitry, replay events are induced by changes in stimulus content. Alternatively, confounds arising from the low-level visual properties of both stimulus types (e.g., differences in duration or perceptual quality of transitions between rivalry and replay) could account for the observed effects. In line with the latter scenario, a recent fMRI study proposed that frontoparietal cortex activates in response to differences in perceptual transitions, based on the finding of greater frontoparietal activity during long versus short transitions (Knapen et al., 2011). Here, we therefore aimed at precluding differences in transition duration as a confounding factor by using the Lissajous figure, in which mixed percepts or gradual transitions are hardly ever reported. The fact that perceptual transitions of the Lissajous figure most frequently occur at critical stimulus positions (an important feature distinguishing it from other multistable patterns) facilitated the construction of a matched replay condition. Indeed, our careful behavioral assessment confirmed that participants perceived ambiguous and replay transitions as being highly similar. If anything, transition durations were rated as slightly longer in the replay condition, thus clearly ruling out longer perceptual transitions in the ambiguous condition as an explanation for any of our findings.
Based on a correlative fMRI analysis alone, however, one cannot draw any conclusions on whether the observed difference in brain activity is brought about by bottom-up or top-down effects, or a mixture of both, as one could explain the observed BOLD data in terms of all three scenarios. We aimed to disentangle these hypotheses by fitting corresponding dynamic causal models to the signal time courses from right hMT+ (visual input region), and right IFG (part of the frontoparietal network that showed a robust activation difference between ambiguous and replay events) (see also Sterzer and Kleinschmidt, 2007). Effective connectivity analysis showed that the probability of top-down models, in which ambiguous transition events act as modulatory inputs to the connection from IFG to hMT+, dramatically exceeded the probability of bottom-up and reciprocal models. This suggests crucial involvement of right IFG activity in perceptual transitions during bistable perception (e.g., by mediating a reorganization of activity in visual cortex) (Leopold and Logothetis, 1999; Sterzer et al., 2009).
Together, our data argue against the notion of frontoparietal activity being a mere consequence of perceptual conflicts resolved at the level of visual cortex, but are in line with evidence from lesion studies, chronometric fMRI, transcranial magnetic stimulation, and the analysis of prestimulus electroencephalographic signals, suggesting a causal role of right frontal (Meenan and Miller, 1994; Sterzer and Kleinschmidt, 2007) and parietal (Britz et al., 2011; Kanai et al., 2011) cortex in perceptual transitions. To formulate an exhaustive model, follow-up studies need to integrate time-resolved neurophysiological data and characterize to what degree the interplay between frontoparietal and visual cortex is further shaped by ongoing low-level (Hesselmann et al., 2008) and high-level (Alink et al., 2010) visual processes during bistable perception.
Footnotes
G.H. was supported by the German Research Foundation (Grant HE 6244/1-1). K.L. and V.A.W. were supported by the Studienstiftung des deutschen Volkes (German National Academic Foundation). We thank Katharina Schmack for her idea to revive the Lissajous figure.
The authors declare no competing financial interests.
- Correspondence should be addressed to either Dr. Philipp Sterzer or Dr. Guido Hesselmann, Visual Perception Laboratory, Department of Psychiatry and Psychotherapy, Campus Charité Mitte, Charité Universitätsmedizin, 10117 Berlin, Germany, philipp.sterzer{at}charite.de or guido.hesselmann{at}charite.de