Abstract
Humans constantly receive massive amounts of information, both perceived from the external environment and imagined from the internal world. To function properly, the brain needs to correctly identify the origin of information being processed. Recent work has suggested common neural substrates for perception and imagery. However, it has remained unclear how the brain differentiates between external and internal experiences with shared neural codes. Here we tested this question in human participants (male and female) by systematically investigating the neural processes underlying the generation and maintenance of visual information from voluntary imagery, veridical perception, and illusion. The inclusion of illusion allowed us to differentiate between objective and subjective internality: while illusion has an objectively internal origin and can be viewed as involuntary imagery, it is also subjectively perceived as having an external origin like perception. Combining fMRI, eye-tracking, multivariate decoding, and encoding approaches, we observed superior orientation representations in parietal cortex during imagery compared with perception, and conversely in early visual cortex. This imagery dominance gradually developed along a posterior-to-anterior cortical hierarchy from early visual to parietal cortex, emerged in the early epoch of imagery and sustained into the delay epoch, and persisted across varied imagined contents. Moreover, representational strength of illusion was more comparable to imagery in early visual cortex, but more comparable to perception in parietal cortex, suggesting content-specific representations in parietal cortex differentiate between subjectively internal and external experiences, as opposed to early visual cortex. These findings together support a domain-general engagement of parietal cortex in internally generated experience.
SIGNIFICANCE STATEMENT How does the brain differentiate between imagined and perceived experiences? Combining fMRI, eye-tracking, multivariate decoding, and encoding approaches, the current study revealed enhanced stimulus-specific representations in visual imagery originating from parietal cortex, supporting the subjective experience of imagery. This neural principle was further validated by evidence from visual illusion, wherein illusion resembled perception and imagery at different levels of cortical hierarchy. Our findings provide direct evidence for the critical role of parietal cortex as a domain-general region for content-specific imagery, and offer new insights into the neural mechanisms underlying the differentiation between subjectively internal and external experiences.
Introduction
In complex environments, humans are overwhelmed with information from external and internal origins. To function properly, the brain needs to differentiate between what is perceived externally and what is imagined internally (Dijkstra et al., 2022). Extensive evidence has suggested a large overlap in neural processing between imagery and perception, in terms of common univariate BOLD activations (Ishai et al., 2000; Ganis et al., 2004), and shared neural representations between perceived and imagined contents as revealed by multivariate decoding approaches in visual cortex (Stokes et al., 2009; Reddy et al., 2010; Albers et al., 2013; Ragni et al., 2020). Yet, it is less understood why imagery and perception feel so different despite these similarities (Koenig-Robert and Pearson, 2021).
Previous work has demonstrated a reduced level of neural activation (Ishai et al., 2000; Ganis et al., 2004; Keogh et al., 2020), as well as weaker stimulus-specific representations (Lee et al., 2013; Dijkstra et al., 2018), during imagery in early visual cortex (EVC) compared with perception, given that perception is driven by sensory signals reaching EVC first, whereas imagery does not receive any direct external stimulation (Iamshchinina et al., 2021; Yu and Postle, 2021). This account could possibly explain why perception evokes a stronger sensory experience than imagery (Dijkstra et al., 2017; Dijkstra and Fleming, 2023), but does not explain how imagery contents are generated in the absence of external stimulation or why imagery feels internal. Although previous work has demonstrated top-down feedback modulation from frontoparietal cortex during imagery (Mechelli et al., 2004), most of the observations were based on univariate activations, and do not address whether these higher-order brain regions carry content-specific neural representations that discriminate between imagery and perception. Here, we tested the idea that neural signals contributing to the subjective internality of imagery arise from higher-order brain regions that serve as the source of content-specific imagery representations. As discussed earlier, external experience of perception is supported by stronger neural representations in EVC; with this logic, we propose that content-specific imagined representations should exceed perceptual representations in (some) higher-order regions, resulting in the internal experience of imagery. Moreover, if such a neural principle exists, it should be generalizable to conditions that are neither completely internal nor external, such as illusion (Bergmann et al., 2019). Illusion represents another type of sensory experience that is dissociated from direct sensory inputs and can be conceptualized as involuntary imagery (Pearson, 2019). On the other hand, similar to veridical perception, illusion creates the subjective feeling of external stimulation. In other words, illusion and perception can be considered subjectively external experiences, while imagery is subjectively internal. We therefore predict that illusion should share similarities with perception and imagery at different levels of cortical processing: it should behave more similarly as imagery at the early stage of cortical processing, and should resemble perception to a larger extent at the later stages.
Here we propose that parietal cortex remains a good candidate as the source region for content-specific imagery, for several reasons: first, although earlier research suggested variable representations of imagined contents in parietal cortex, recent advances in visual working memory research have suggested robust stimulus-specific information in working memory in parietal cortex (Sprague and Serences, 2013; Ester et al., 2015; Yu and Shim, 2017). Given the tight link between working memory and imagery (Albers et al., 2013), it is reasonable to expect robust stimulus-specific representations in parietal cortex during imagery. Second, parietal cortex encodes various types of information during visual imagery and working memory, including spatial (Sprague and Serences, 2013), feature (Yu and Shim, 2017), object (Ragni et al., 2020), and pictorial (Breedlove et al., 2020) information. Last, parietal cortex also encodes information in various illusory contexts (Liu et al., 2019; Arsenovic et al., 2022). Despite all these, surprisingly, none of the previous studies, to our knowledge, has directly tested whether representations of imagery exceed those of perception and illusion in any brain regions, including parietal cortex.
Materials and Methods
Participants
A total of 52 volunteers participated in the study. All were recruited from the Chinese Academy of Science Shanghai Branch community, and were naive to the purpose of the study. Seventeen volunteers (6 males, mean age = 23.4 ± 1.5 years) participated in Experiment 1. Twenty-one volunteers (9 males, mean age = 24.9 ± 1.7 years) participated in the main experiment of Experiment 2. Three participants were excluded because of excessive head motion or technical problems during fMRI scanning, leaving 18 participants in the final sample (8 males, mean age = 24.8 ± 1.8 years). Five volunteers (2 males, mean age = 24.0 ± 1.4 years) participated in the retrocue version of Experiment 2. Seventeen volunteers (3 males, mean age = 23.8 ± 1.8 years) participated in the eye-tracking version of Experiment 2. Four participants were excluded because of excessive eye blinks/head motion or technical problems during recording, resulting in a final sample size of 13 (2 males, mean age = 23.7 ± 1.6 years). We did not determine the sample size of each experiment a priori, but the sample size used in the current study was comparable to those in studies with a similar approach (Ester et al., 2015; Liu et al., 2019; Yu and Postle, 2021). Several volunteers participated in more than one experiment: one volunteer participated in both main experiments of Experiments 1 and 2. All volunteers in the retrocue version also participated in the main experiment of Experiment 2. One volunteer participated in both Experiment 1 and eye-tracking experiment, and one volunteer participated in both main experiment of Experiment 2 and eye-tracking experiment. All participants had normal or corrected-to-normal vision and reported no neurologic or psychiatric disease. All provided written informed consent approved by the ethics committee of the Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, and were monetarily compensated for their participation. Participants filled out a revised version of the Vividness of Visual Imagery Questionnaire (VVIQ) (Marks, 1973) as an evaluation of their general imagery ability (1-5 points of rating: 1, no image at all; 5, as vivid as normal vision), and those who reached an average VVIQ score of >2 (Kay et al., 2022) proceeded to the main experiment. All of our participants met this criterion.
Stimuli and procedure
Overview
The current study investigated neural processes underlying the generation and maintenance of stimulus-specific information during voluntary imagery, veridical perception, and illusion. Typical imagery tasks often involve a delay period during which imagery contents emerge and persist, whereas perception does not. Hence, instead of simply contrasting memory delay of imagery and sample encoding of perception, which pertain to two temporally distinct processes, we embedded a delay period into each condition, which would allow for comparisons of neural codes between distinct processes. In all three conditions, the trial started with the presentation of a perceptual or illusory stimulus, or an imagery cue, followed by a common delay period before a response was made. We referred these three conditions as perception-based, imagery-based, and illusion-based delayed-recall, respectively (perception, imagery, and illusion in short). In other words, we could track the temporal evolution of neural representations in perception and illusion conditions, as what was typically done for only imagery condition in previous work. We used two distinct sets of stimuli to test our hypothesis: moving Gabors along oriented paths in Experiment 1 and static line orientations in Experiment 2, and the way to cue imagery also differed between experiments. Below we present details of stimuli generation and experimental procedure in both experiments.
For fMRI, all stimuli were generated using MATLAB R2012b (The MathWorks) and Psychtoolbox-3. Stimuli were presented on a SINORAD monitor screen (37.5 × 30 cm, 1280 × 1024 pixels at 60 Hz) at the back of the scanner bore and projected onto a mirror mounted on the head coil, with a viewing distance of 90.5 cm. Participants lay supine in the scanner and completed the task using two SINORAD two-key button boxes. Eye-tracking was performed using MATLAB R2018a, with a viewing distance of 70 cm and a screen resolution of 1680 × 1050.
Experiment 1
In Experiment 1, stimuli were Gabor patterns (a full contrast sinusoidal grating weighted by a Gaussian function) with a spatial frequency of 0.94 circles per degree and a diameter of 1.6° of visual angle, presented against a uniform gray background (RGB values: [128, 128, 128]). On perception trials, the Gabor pattern moved back and forth along a tilted linear path, either leftward or rightward relative to the vertical line, with its internal grating texture remaining static. The motion direction of the Gabor reversed every 1 s. For example, for a leftward path, the Gabor first moved toward the top left corner for 1 s, then reversed its direction and moved toward the bottom right corner for another 1 s. The path orientation was determined by an illusion size measurement task (see below), individually for each participant before scanning. The speed in the vertical direction was 5°/s, whereas the speed in the horizontal direction and the path length varied across participants based on the orientation of the path. On imagery trials, the stimulus was replaced by a symbolic cue centered at the fixation point. The letters L and R were used to prompt participants to form imagery of a Gabor moving back and forth along a leftward or rightward pathway, respectively (relative to the vertical line) in the visual periphery at the same location as the perceptual and illusory stimuli without actually viewing the Gabor pattern. For two participants, light gray and dark gray circles were used as cues instead of letters, and were presented peripherally at the center of the motion path. On illusion trials, the Gabor pattern moved back and forth vertically along a linear path with a speed of 5°/s (external motion), while its internal texture drifted horizontally to the left or right with a temporal frequency of four cycles/s (internal motion), making a double-drift stimulus (Liu et al., 2019). By presenting the stimulus in the participant's peripheral visual field, the combination of the external motion and the orthogonal internal motion would tilt the perceived motion path either to the left or right relative to the physical motion path (i.e., inducing a visual illusion of the path orientation). The length of the motion trajectory was 5°, and the direction of the internal motion reversed at the two endpoints, in synchrony with path reversals, every 1 s. For perception and illusion conditions, the midpoint of the motion trajectory was located at 5° horizontally to the right of the screen center, and the initial spatial phase of the Gabor pattern was randomly generated for each trial. Moreover, a fixation point (0.3° diameter) in black was displayed at 3° horizontally to the left of the screen center, making the stimulus appear in a more peripheral visual field (8° of eccentricity) for participants (see Fig. 1A). The orientation of the motion path (left or right) served as the labels for decoding in the subsequent analyses. The Gabor moved back and forth along the motion path in perception and illusion trials to allow the double-drift illusion to stabilize. Because the focus of decoding was path orientations rather than motion directions, the use of two directions of motion could eliminate the effect of motion direction on decoding.
Before the main task, participants underwent a measurement of the size of their double-drift illusion inside the scanner. Stimuli parameters were identical to those used in the main task. Participants were instructed to keep their gaze at the fixation point throughout all tasks, unless specified. Each trial began with a 1 s fixation period. Following that, a Gabor pattern appeared in the periphery and moved back and forth along a vertical path for 2 s, with its internal texture drifting in an orthogonal direction (double-drift left-tilted and double-drift right-tilted trials) or remaining static (no-drift trials). After that, a response needle (0.05° in width, 5° in length) centered at the fixation point was presented. Participants used four buttons to rotate the needle until it matched their perceived angle of the motion trajectory of the Gabor pattern as precisely as possible. The initial orientation of the needle was randomly chosen on every trial. Participants made clockwise or counterclockwise rotations in steps of 1° by pressing one of two buttons (labeled with 1 and 2) on the right button-box with their thumb, and could quickly rotate the bar by 90° via pressing the 3 button on the left button-box. Participants pressed the “enter” button (labeled with 4) on the left button-box to indicate they were satisfied with their response and to start the next trial. In each run, three types of stimuli (left-drift, right-drift, and no-drift of the Gabor internal texture) were randomized across trials, and each type was presented on one-third of the 30 trials. Twelve participants completed 2 runs of the measurement task and five participants completed three runs, lasting 9-14 min in total. To ensure accurate responses, only data from the last run were used to calculate the orientation of the physical path of the perception stimulus in the main task. This measured illusion size was taken as the orientation of the motion path for the perception trials in the main task. In other words, the orientation of the motion path in the three conditions was designed to be matched within each participant, which we referred to as the anchor orientation.
On each trial of the delayed-recall task (main task), a sample stimulus (a symbolic cue or a moving Gabor pattern) was displayed for 2 s, followed by an 8.5 s delay. During perception and illusion trials, participants viewed the moving Gabor pattern displayed in their right peripheral visual field and were instructed to remember it over the delay. During imagery trials, participants were prompted by the cue to actively imagine a Gabor moving back and forth along the corresponding (leftward or rightward) trajectory during the delay. Following the delay, participants rotated a needle (0.05° in width, 5° in length) centered at the fixation point to match the orientation of the trajectory of the memorized or imagined moving Gabor pattern. Participants had 4.5 s to respond using the same four buttons as in the measurement task. Each trial ended with an intertrial interval (ITI) of 4.5, 6, or 7.5 s. In each run, participants completed 18 trials (6 min and 25.5 s per run), and the conditions (imagery, perception, and illusion) and stimulus directions (i.e., leftward or rightward trajectory) were balanced and randomly ordered across trials. Overall, 16 participants completed 14 runs (252 trials in total) within a single fMRI session lasting ∼120 min, except for one subject (S002) who completed 12 runs because of a technical problem with the scanner. Before the fMRI session, participants practiced 2 or 3 runs of the measurement task and 1 or 2 runs of the delayed-recall task outside the scanner to ensure they understood the task instructions correctly and were comfortable with the button responses.
Experiment 2
In Experiment 2, sample stimuli were oriented bars randomly chosen from a continuous orientation space from 1° to 180° in 1° increments. On perception trials, the stimulus was composed of two distant discs (0.8° in diameter) connected by a bar (0.6° in width); whereas on imagery trials, the stimulus consisted of two distant discs only. Hence, participants were required to memorize a physically present oriented bar on perception trials, while they need to imagine the missing bar between the two discs on imagery trials. Inspired by the classic Kanizsa triangle illusion, the stimulus on illusion trials consisted of two distant discs with opposite openings facing each other (see Fig. 1C), resulting in a subjective visual perception of a completed illusory contour that defined an oriented bar occluding the discs at both ends (thus perceiving an outline that did not actually exist). All stimuli (4° radius) were presented centrally on a uniform gray background screen (RGB values: [128, 128, 128]) in black. To prevent participants from memorizing the location of the end discs instead of the bar orientation, a jitter of ± 0.4° was applied to the radius on each trial.
The trial time course of the delayed-recall task in Experiment 2 was similar to that of Experiment 1: on each trial, a sample stimulus with a randomly selected orientation was displayed for 1 s without a fixation point, followed by a 9.5 s delay. During perception and illusion trials, participants were required to memorize the perceived oriented bar throughout the delay, while during imagery trials, participants were required to internally generate a mental image of a bar connecting two end discs and maintain it in their mind's eye. After the delay, a probe dial (4° radius) appeared centrally with its needle in a random initial orientation. Participants rotated the needle to match the maintained orientation within a 4.5 s response window. The usage of response buttons was identical to that in Experiment 1. The ITI varied in 6, 7.5, or 9 s. At the end of each run, participants received feedback regarding their overall performance. Each run consisted of 18 trials (6 of each condition) presented in a randomized order, lasting 6 min 52.5 s. Participants completed 20 runs of the delayed-recall task divided across two fMRI scanning sessions, and the orientations of sample stimuli evenly covered the entire orientation space across 10 consecutive runs within each session. In total, 17 participants completed 360 trials (i.e., 120 trials for each condition), and one participant (S012) completed 342 trials (i.e., 19 runs). Moreover, 5 runs for S006 were not included in subsequent analyses because of alignment failure, and 10 runs for S021 were excluded because of a technical problem with the scanner. Before the fMRI sessions, participants underwent a practice session, which ended when the mean absolute response error of the last practice run fell below 15°.
In addition to the delayed-recall task, Experiment 2 included an independent perception-mapping task to map out sensory-evoked signals of each orientation used in the main task (Harrison and Tong, 2009; Rademaker et al., 2019; Yu and Shim, 2019). The stimuli in this task were identical to the sample stimuli in perception trials of the main task. On each trial, a stimulus (an oriented bar with two end discs) flickering at 1.8 Hz was presented for 4.5 s, followed by an ITI of 6, 7.5, or 9 s. Each run consisted of 30 trials and lasted 4 min 37.5 s. During the task, participants maintained their attention on the fixation point and performed a detection task. They were instructed to press the “1” button as quickly as possible whenever the fixation point turned green. Data from six runs of the perception-mapping task were acquired across two fMRI scanning sessions for 17 participants, and stimuli of 180 trials evenly covered 1° to 180°; and data from five runs were obtained for one participant (S012). For one participant (S021), data from three runs in the first session were not included in subsequent analyses because of a technical problem with the scanner. Participants also practiced 2 or 3 runs to achieve an accuracy of above 70% before scanning.
A subset of participants (n = 5) performed a retrocue version of Experiment 2 in two additional scanning sessions. The inclusion of the retrocue task aimed to further control for the potential influences of sample-driven activity on delay-period signals, as used in previous literature (Harrison and Tong, 2009; Dijkstra et al., 2018; Yu and Shim, 2019). In the retrocue version, the trial time course was similar to that in the main experiment of Experiment 2, except that two sample stimuli were presented sequentially, followed by a retrocue, before the onset of the delay. Specifically, each sample was presented for 0.8 s with an interstimulus interval of 0.4 s. After 0.4 s following the offset of samples, a retrocue (a digit of “1” or “2”) was presented for 0.6 s. Participants were required to maintain the cued sample during the delay. The response window was 4.5 s, and ITI varied in 4.5, 6, and 7.5 s. Each run consisted of 18 trials, and participants completed 20 runs across two additional scanning sessions.
A separate eye-tracking version of Experiment 2 was also conducted to record participants' eye gaze positions during the delayed-recall task. The trial time course and stimuli were also similar to those in the main experiment of Experiment 2, except that sample and delay periods were shortened to 0.4 and 2.4 s, respectively, and the radius of stimuli was changed to 2.5°. Trials from different conditions were conducted in separate runs in a randomized order instead of being interleaved within a single run. Each run consisted of 60 trials, and participants performed six runs in total. Participants' eye gaze positions were recorded using a Desktop mount Eyelink 1000 Plus eye tracker (SR Research) at a sampling rate of 1000 Hz.
Behavioral analyses
For illusion size measurement in Experiment 1, the perceived orientation of each trial was calculated as the absolute circular distance between the response orientation and the vertical line, the difference between the mean of perceived orientation on double-drift and that on no-drift trials was then calculated as the orientation of the physical path on perception trials in the main task (i.e., the anchor orientation). For the delayed recall task in Experiment 1, recall error was computed as the angular difference between the reported and anchor orientations. Moreover, the SD of the reported orientations (relative to the vertical line) of each condition were computed to indicate recall variability. Differences between conditions were characterized using one-way repeated-measures ANOVAs. The behavioral data of three participants (S001, S002, and S003) were not recorded because of a coding error, so behavioral results for Experiment 1 reported here were based on data of 14 participants. Similar analyses were performed for recall error and reaction time in Experiment 2, in which recall error was computed as the absolute circular distance between the response orientation and sample orientation.
Data acquisition
MRI scanning was performed at the Center for Excellence in Brain Science and Intelligence Technology, Functional Brain Imaging Platform on a Siemens 3T Tim Trio MRI scanner with a 32-channel head coil. High-resolution T1-weighted anatomic images were acquired using an MPRAGE sequence (2300 ms TR, 3 ms TE, 9° flip angle, 256 × 256 matrix, 192 sequential sagittal slices, 1 mm3 isotropic voxel size). Whole-brain functional images were acquired using a Multiband 2D gradient-echo echo-planar (MB2D GE-EPI) sequence with a multiband factor of 2, 1500 ms TR, 30 ms TE, 60° flip angle within a 74 × 74 matrix (46 axial slices, 3 mm3 isotropic voxel size). For three participants (S001, S006, and first session of S005), functional images were acquired using the MB2D GE-EPI sequence with a FA of 90° and other parameters remained same as above.
fMRI preprocessing
Preprocessing of fMRI data were performed using AFNI (afni.nimh.nih.gov) (Cox, 1996). The first five volumes of each functional run were discarded. Subsequently, data from each scanning session were first registered to the last volume of the last run in the same session and then to the T1-weighted anatomic images obtained in the first session. After within-session (Experiment 1) or cross-session (Experiment 2) alignment, motion correction and detrending (linear, quadratic, cubic) were applied to the data. Voxel-wise signal timeseries were z-scored on a run-by-run basis for subsequent univariate and multivariate analyses.
ROI definition and voxel selection
We used the probabilistic atlas developed by Wang et al. (2015) to identify the individual anatomic ROIs. Masks of the probabilistic atlas were first warped back to each participant's structural images in their native space, including V1, V2, V3, hV4, V3AB, VO1, VO2, hMT, MST, IPS0-5, and FEF. After that, we extracted the unilateral masks of V1, V2, and V3 of each participant and merged them within (for Experiment 1) or between (for Experiment 2) hemispheres to create the anatomic EVC ROIs. Similarly, we merged masks of IPS0-5 create the anatomic intraparietal sulcus (IPS) ROIs. As a control, we also used mask of FEF to create the anatomic superior precentral sulcus (sPCS) ROIs in frontal cortex, and merged masks of hMT and MST to create the anatomic MT+ ROIs because Experiment 1 involved moving stimuli.
To create the functionally defined ROIs, we conducted voxel selection within each anatomic ROI based on task-related activations. A conventional mass-univariate GLM analysis was implemented in AFNI, where the GLM included six nuisance regressors regarding the participant's head motion and three regressors of interest: sample, delay, and probe periods of the task modeled with boxcars (with a duration of 2, 8.5, and 4.5 s, respectively, for Experiment 1, and a duration of 1, 9.5, and 4.5 s, respectively, for Experiment 2) that were convolved with a canonical HRF. The sample – baseline contrast was used to define the functional EVC ROIs, and the delay – baseline contrast was used to define the remaining functional ROIs, by selecting 500 voxels that displayed the strongest responses to the specific contrast within the anatomic ROI. Because of the different positions of stimulus presentation in two experiments (right peripheral vs central), in Experiment 1, the functional ROIs were identified separately within the left (i.e., contralateral) and right (i.e., ipsilateral) anatomic ROIs; and in Experiment 2, identified within the anatomic ROIs merged between hemispheres.
fMRI analyses: multivariate pattern analysis
For Experiment 1, to examine how well the orientation of the motion path (left vs right) of the perceived/imagined Gabor pattern could be discriminated by the multivoxel response patterns in different brain regions, we performed an ROI-based multivariate pattern analysis, separately for each participant, using a two-class support vector machine (SVM) classifier built-in in MATLAB R2018b (“fitceoc” and “predict” functions). The SVM classifier was trained and tested on preprocessed and z-scored BOLD data within each condition, using a leave-one-run-out cross-validation procedure (i.e., patterns from all-but-one runs served as the training set, whereas patterns from the remaining run constituted the test set). In each condition, decoding accuracies were obtained by calculating the proportion of trials in which the classifier correctly predicted the path orientation among all trials of the condition. To investigate the temporal dynamics and similarity of neural codes between conditions, we also performed temporal generalization and cross-condition decoding analyses, using the same leave-one-run-out procedure. Specifically, in temporal generalization analysis, for each condition, the classifier was trained on data from each time point, and tested on data from every time point. In cross-condition decoding analysis, the classifier was trained on data from each condition, and tested on data from all three conditions. The procedure of all decoding analyses was performed for each time point and ROI separately. Trials with opposite responses (i.e., responding “left” on right trials, or vice versa) were excluded from the decoding analyses. The remaining trials were further balanced between left and right trials within each condition to avoid overfitting in decoding. After this procedure, a total of 2.1 ± 1.5 trials were excluded for each condition on average.
fMRI analyses: inverted encoding model (IEM)
In Experiment 2, sample orientations covered the entire stimulus space (1°-180°). To reconstruct the neural representation of fine-grained orientation in each ROI, the IEM analyses were implemented using custom MATLAB codes (Brouwer and Heeger, 2009, 2011; Ester et al., 2015; Yu and Shim, 2017; Yu and Postle, 2021). The IEM assumes that the activation of each voxel could be characterized by a linear combination of a set of channel responses expressed by idealized tuning functions. Here, nine orientation channels equally spaced between 1° and 180° were used, and the response of each channel was modeled as a sinusoid raised to the eighth power as follows:
Independent training data B1 (m voxels × n1 trials) were combined with the stimulus orientation channel profiles C1 (k channels × n1 trials) to estimate an encoding model consisting of a weight matrix W (m voxels × k channels) which quantified the approximate contribution of each channel to the observed BOLD response in an ROI (i.e., B1). The relationship between B1, C1, and W can be described as follows:
The weight matrix was computed through least-square linear regression as follows:
Next, the encoding model was inverted such that it became a decoder that was used to estimate the channel responses C2 (k channels × n2 trials) for the testing data B2 (m voxels × n2 trials):
To make the estimated channel responses fully cover the stimulus space which comprised all integers ranging from 1° to 180°, the above steps were repeated 20 times and the centers of nine tuning functions were shifted by 1° on each iteration.
Finally, all estimated channel responses were circularly shifted to a common center (0° on the x axis representing the sample orientation of each trial) and averaged across trials, resulting in a single reconstruction for visualization and subsequent statistical comparisons. To quantify the strength of IEM reconstructions, the averaged channel responses on both sides of the common center were collapsed over and averaged, and the slope of each collapsed reconstruction was then computed through linear regression (Foster et al., 2017; Yu and Postle, 2021). A larger positive slope indicated a stronger positive representation.
The IEM analyses reconstructed population-level representations of orientations and allowed us to compare the orientation representational strength between different conditions within each ROI at any given time point. To ensure a fair comparison between conditions, we focused on the results of two IEM analyses where the IEM decoded testing data from different conditions with a common inverted encoder (Sprague et al., 2018): (1) a mixed IEM trained on data from all conditions (perception, imagery, and illusion) and tested on data from each condition of the delayed-recall task; (2) an independent IEM trained on data averaged over 4.5-6 s (4-5 th TRs; to avoid confusion, all the time points in the main text and methods referred to the onset of the corresponding TRs) of the perception-mapping task and tested on data from each condition of the delayed-recall task. Notably, the mixed IEM were trained and tested with a leave-one-run-out cross-validation procedure, while the perception-mapping IEM was not. In addition, we also performed temporal generalization and cross-condition IEM analyses, following a similar procedure as in Experiment 1. All IEM analyses described above was performed for each time point and ROI separately. In the retrocue version of Experiment 2, two sample stimuli with different orientations were presented. We used the retrocued orientation to train the IEM, together with a mixed-model approach.
Univariate analyses
In addition to multivariate analyses, we also characterized univariate BOLD time course in each ROI during the delayed-recall task. Specifically, we calculated the signal change in BOLD activity relative to baseline for each time point, where baseline was defined as the BOLD activity of the first TR of each trial. For each condition, the BOLD signal change was averaged across all voxels in each ROI and across trials within the condition.
To test whether the observed differences in multivariate decoding were simply caused by aggregated univariate BOLD activation differences, for a given TR, the mean BOLD activity was calculated by averaging the BOLD signals across trials of each condition and across voxels within each ROI. For each trial, the corresponding mean BOLD activity was then subtracted from data from each voxel to generate new data controlled for activation differences between conditions. The same multivariate analyses were then performed on the mean-removed BOLD data.
Experimental design and statistical analysis
Both Experiments 1 and 2 used a within-subject design with three conditions: perception, imagery, and illusion. For multivariate analyses in both experiments, a bootstrapping procedure was used to evaluate the statistical significance of the accuracy/slope and the statistical differences between accuracies/slopes. In Experiment 1, for each combination of factors (ROI, condition, or TR), by randomly sampling with replacement from the pool of all 17 participants' decoding results repeatedly, we generated 10,000 simulated samples, each consisting of 17 decoding accuracies, and calculated the mean accuracy of each sample. The one-tailed p value of the decoding accuracy was computed to indicate the probability of obtaining a below-chance (<0.5) accuracy among the 10,000 accuracies. In Experiment 2, a similar bootstrapping procedure was performed with the IEM reconstructions, resulting in 10,000 averaged IEM orientation reconstructions. Following that, the slope of each averaged reconstruction was computed and the proportion of negative slopes among the 10,000 slopes was counted as the one-tailed p value of the slope. For the time course of IEMs in each condition, the obtained p values denoting accuracy/slope significance were corrected for multiple comparisons using the false discovery rate (FDR) method across ROIs, conditions, and time points. For the temporal generalization analysis, the p values in the temporal generalization matrix were corrected using a cluster-based permutation method.
To better illustrate the differences between conditions within different ROIs, we defined an early and a late epoch out of the sample and delay periods: for each ROI, the time points of the early epoch were chosen around the peak time points of the stimulus-evoked BOLD activity (Experiment 1: 4.5-6 s [4-5 th TRs] for EVC and 4.5-7.5 s [4-6 th TRs] for IPS; main task of Experiment 2: 4.5-6 s [4-5 th TRs] for EVC and 6-7.5 s [5-6 th TRs] for IPS; retrocue version of Experiment 2: 4.5-6 s [4-5 th TRs] for EVC and 7.5-9 s [6-7 th TRs] for IPS), whereas the time points of the late epoch were chosen as the last two TRs toward the end of the delay period (9-10.5 s [7-8 th TRs] for EVC and IPS in Experiment 1 and main task of Experiment 2; 10.5-12 s [8-9 th TRs] for retrocue version of Experiment 2) during which cleaner memory-related signals were collected. For Figure 9, a common early epoch (4-6 th TRs) was used to allow direct comparisons between ROIs.
Within each epoch, we performed a two-way (ROIs × conditions) repeated-measures ANOVA to test the differences between ROIs and conditions. For pairwise comparisons between conditions, we again used a bootstrapping procedure. For instance, to test whether there were differences in accuracies between two conditions within a specific epoch and ROI, 17 pairs of accuracy were randomly sampled with replacement from the pool of 17 participants' accuracies of these two conditions, and the difference between each pair was calculated and averaged. We repeated this procedure 10,000 times, obtaining 10,000 accuracy differences, and counted the proportion of accuracy differences with opposite signs among the 10,000 accuracy differences as the one-tailed p value of the accuracy difference between these two conditions. The significance of the slope difference between two IEM reconstructions was obtained using the same method as described above. Within each epoch, p values were corrected for multiple comparisons using FDR across ROIs and comparisons.
In addition to p values, we also calculated the Bayes factor (BF) for the main results, using the JASP software (Love et al., 2019). Specifically, the BF10 can be understood as the ratio of the likelihood of the alternative hypothesis (H1: condition 1 ≠ condition 2) against the null hypothesis (H0: condition 1 = condition 2). A BF10 >1 would indicate greater evidence in favor of the alternative hypothesis (i.e., significant orientation representation), while a BF10 < 1 would indicate greater evidence in favor of the null hypothesis (i.e., nonsignificant orientation representation). Typically, A BF10 >3 would be considered substantial evidence in support of the alternative hypothesis, while a BF10 < 1/3 would be considered substantial evidence in support of the null hypothesis. We performed Bayesian student paired t tests for pairwise comparisons to obtain the BFs for further interpretation on the amount of evidence for the null results.
To test whether there was significant univariate BOLD activity against baseline, one-tailed, one-sample t tests against 0 were used and the resulting p values were corrected across conditions and time points using the FDR method. To examine whether there were significant differences in univariate BOLD activity between ROIs and conditions, we performed a two-way (ROIs × conditions) repeated-measures ANOVA with each epoch. For subsequent pairwise comparisons between conditions, one-tailed, paired t tests were used, and p values were FDR corrected.
Eye-tracking data analyses
Eye-tracking data in Experiment 2 were preprocessed using methods provided in previous literature (Nystrom and Holmqvist, 2010) and custom codes. Data were first filtered using a Savitzky-Golay FIR smoothing filter. Eye blinks and other artifacts were removed using a velocity-based algorithm and acceleration criteria. Data were then baseline-corrected using average data from −200 to 0 ms before sample onset, and averaged within a 100 ms time window to reduced computational load. To examine whether the decoding differences between conditions were a result of differences in eye movement patterns, we decoded orientations from eye position data. Because eye position data consisted of only two dimensions (x and y coordinates), IEM was not applicable. Instead, we used SVMs to decode orientations. To be specific, we divided all trials into four bins based on their orientations (22.5°-67.5°, 67.5°-112.5°, 112.5°-157.5°, 157.5°-22.5°), and performed SVM classification between bins that were 90° apart. Decoding accuracies were then averaged across classifiers. This procedure was performed for each condition and each time point separately, and decoding results were further averaged across time points of sample and delay periods. Differences in eye position decoding performance between different conditions were evaluated using a one-way repeated-measures ANOVA.
Results
Experiment 1
Behavior performance was comparable between perception, imagery, and illusion-based working memory in Experiment 1
Participants (n = 17) performed a delayed-recall task inside the MRI scanner, during which they memorized a physical, imagined, or illusory orientation of the motion path of a Gabor pattern over a prolonged delay as precisely as possible (Fig. 1A). On perception trials, memoranda were Gabor patterns moving either along a leftward or rightward path in the right visual periphery. On imagery trials, no sample stimulus was presented, and participants had to imagine a leftward or rightward moving Gabor with the same orientations as those in the perception condition, based on a symbolic cue. On illusion trials, memoranda were Gabor patterns moving vertically, with an internal leftward or rightward drift. This type of double-drift stimulus would create a strong illusion on the perceived path of the motion, such that the perceived motion path was strongly deviated from its vertical physical path and tilted toward its internal drift direction (Liu et al., 2019). Before the main task, participants were measured on the size of their double-drift illusion (38.36 ± 5.71°). This measured illusion size was taken as the orientation of the motion path of the perception trials in the main task. In all conditions, participants reported their perceived or imagined orientation on an orientation wheel. Before the main experiment, participants filled out a revised version of the VVIQ (Marks, 1973) as an evaluation of their general imagery ability (1-5 points of rating: 1, no image at all; 5, as vivid as normal vision), with an average VVIQ score of 4.20 ± 0.55. Overall, participants' performance was highly comparable across conditions, regardless of the type of the sample stimulus (Table 1): The mean recall error, defined as the angular difference between reported and anchor orientations, did not significantly differ between conditions (F(2,26) = 2.35, p = 0.116; one-way repeated-measures ANOVA), neither did the SD of the reported orientations (F(2,26) = 1.46, p = 0.251; Fig. 1B) nor the reaction time (F(2,26) = 1.26, p = 0.300).
Task design and behavioral performance. A, In Experiment 1, participants (n = 17) performed a delayed-recall task, during which they precisely memorized a physical, imagined, or illusory orientation of the motion path of a Gabor pattern over a prolonged delay. Each trial began with the presentation of a sample stimulus (leftward or rightward) from one of the conditions in the visual periphery. After a delay, participants rotated the needle presented at fixation to match the orientation of the remembered path as precisely as possible. B, Mean SD of responses in each condition of Experiment 1. Colored bars represent group mean. Error bars indicate ±1 SEM. Gray lines indicate individual participants' results. C, In the main task of Experiment 2 (n = 18), the trial structure was similar to that of Experiment 1, except that sample stimuli sets were changed to static line orientations to cover the entire orientation space. Participants performed a delayed-recall task, during which they precisely memorized a physical, imagined, or illusory orientation over a prolonged delay. Each trial began with the presentation of a centrally presented sample stimulus from one of the conditions. After a delay, participants rotated the needle presented at fixation to match the remembered orientation as precisely as possible. A subgroup of participants (n = 5) also performed a retrocue version of the task, in which two samples and a retrocue pointing to one of the samples were presented before the delay. D, Mean recall error in each condition of Experiment 2. Same conventions as in B. E, Eye position decoding across sample and delay periods in the eye-tracking version of Experiment 2 (n = 13). Same conventions as in B.
Descriptive statistics of behavioral results in Experiments 1 and 2
Orientation decoding in IPS was superior for imagery compared with perception in Experiment 1
To directly compare the representational strength of imagery and perception, here we focused on two predefined ROIs: low-level EVC (V1-V3) and IPS in higher-order parietal cortex. Because sample stimuli were presented in the right visual periphery, we focused all our primary analyses on contralateral ROIs, unless specified. We used “representational strength” to refer to orientation decoding accuracy. A higher decoding accuracy indicates a stronger orientation representation.
To investigate neural representations of stimulus-specific information in perception, imagery, and illusion, we performed multivariate pattern analysis using SVMs to explore to what extent two path orientations (i.e., left vs right) of the perceived/imagined stimuli could be discriminated on the basis of patterns of BOLD activity. Overall, significant and persistent above-chance classification performance was observed in all conditions and ROIs that began at slightly different time points (Fig. 2A): In EVC, path orientations were decodable from sample onset on perception trials (p values < 0.040), earlier than that on imagery and illusion trials (from 3 s onward, p values < 0.034). By contrast, in IPS, decoding accuracy exceeded chance level from 3 s onward on imagery trials (p values < 0.049), instead earlier than that on perception and illusion trials (from 4.5 s onward; p values < 0.032).
Orientation decoding performance in Experiment 1. A, Time course of decoding accuracy in perception (blue), imagery (red), and illusion (green) conditions, in EVC and IPS of Experiment 1. ROIs were contralateral to the side of stimulus presentation. Colored lines on top indicate significant time points of the corresponding condition. Vertical gray lines indicate onset of delay (at 2 s) and of probe (at 10.5 s). Horizontal lines indicate chance level of 0.5. Shaded areas represent error bars (±1 SEM). B, Left, Average decoding accuracy in perception, imagery, and illusion conditions, in early epoch of EVC and IPS of Experiment 1. Horizontal dashed lines indicate chance level of 0.5. Error bars indicate ±1 SEM. Colored asterisks on top indicate significance of the corresponding condition. Right, Differences in decoding accuracy, between imagery and perception (pink), between illusion and perception (light blue), and between imagery and illusion (purple), in early epoch of EVC and IPS of Experiment 1. Error bars indicate ±1 SEM. Significance of pairwise comparisons between conditions: *p < 0.05; **p < 0.01; ***p < 0.001. C, Same as in B, but with results from late epoch. D, Same as in A, but with results from the ipsilateral ROIs. E, Same as in B, but with results from the ipsilateral ROIs. F, Same as in C, but with results from the ipsilateral ROIs.
Given that EVC and IPS are highly reciprocally connected, it is possible that any significant decoding result could be a result of spillover effects from downstream ROIs. Nevertheless, we hypothesized that, if a particular brain region served as the source of decodable stimulus-specific signals in a specific condition, it should exhibit higher decoding performance in this condition compared with other conditions. Unlike most previous studies, here we restricted all comparisons between conditions within the same ROI. This procedure ensured that any differences between conditions were attributable to a specific ROI's sensitivity to different conditions, rather than inherent differences between ROIs in information decodability. To test our hypothesis, we first defined an early and a late epoch out of the sample and delay period and, within each epoch, calculated the average decoding accuracy of each condition and ROI for further comparisons (for details, see Materials and Methods). The delayed-recall task allowed for better dissociation between sample-evoked and memory-related signals. As such, results in the early epoch should mainly be accounted for by sample-driven signals, whereas results in the late epoch likely reflected sustained, memory-related signals. During the early epoch (Fig. 2B), a repeated-measures ANOVA demonstrated a significant interaction effect between conditions and ROIs (F(2,32) = 48.09, p < 0.001). Pairwise comparisons further revealed a reversed pattern of accuracy differences between EVC and IPS as predicted: decoding accuracy was higher in perception compared with imagery in EVC (p < 0.001, BF10 = 2761.9), whereas decoding accuracy was higher in imagery in IPS compared with perception (p = 0.004, BF10 = 4.423). Additionally, accuracy on illusion trials did not differ from imagery (p = 0.102, BF10 = 0.516), but was significantly lower than perception in EVC (p < 0.001, BF10 = 17,787.7); meanwhile, it did not differ from perception (p = 0.450, BF10 = 0.252), but was significantly lower than imagery in IPS (p = 0.002, BF10 = 7.502). Such a difference supported our hypothesis that illusion shared similar cognitive processes with imagery and perception at different stages of cortical processing. In comparison, during the late epoch, there were no significant differences between conditions (p values > 0.254; Fig. 2C). To summarize, the fact that decoding performance for imagery was worse than perception in EVC, and better than perception and illusion in IPS, supported the notion that IPS might serve as the origin of stimulus-specific signals in imagery.
In Experiment 1, because only two path orientations were used, one might argue that participants might have simply memorized the symbolic cues at the visual center, rather than imagining the path orientation in the required visual periphery. However, if this were the case, we would expect to see no difference between results in contralateral and ipsilateral ROIs. To address this concern, we examined the extent to which the imagery signal would be constrained retinotopically. We repeated the decoding analysis in the corresponding ipsilateral ROIs that received little bottom-up input signal, and found that, albeit classification performance was significantly above chance in all conditions (p values < 0.001), decoding accuracies did not differ between conditions (p values > 0.058, Fig. 2D–F). In other words, in ROIs outside the corresponding retinotopic areas of perceived/imagined location, the representational strength of path orientation during perception was similar to that during imagery. Our imagery cues were presented at fixation. As such, this retinotopic specificity of imagery provided further support that participants formed content-specific imagery at the corresponding retinotopic location following task instructions. However, we also note that there was a visual but not significant difference between conditions in the ipsilateral ROIs, in that decoding performance in imagery was visually higher than the other two conditions in both ROIs, especially in the late epoch. In other words, it was also possible that imagery-related signals were present in the ipsilateral ROIs, but the signals did not reach statistical significance. To fully address this concern as well as to test the generalizability of our findings, we conducted Experiment 2, in which participants perceived or imagined line orientations that continuously spanned the entire orientation space (1°-180° in steps of 1°).
Experiment 2
Behavior performance was comparable between perception, imagery, and illusion-based working memory in Experiment 2
Participants (n = 18) performed an adapted version of delayed-recall task in Experiment 2: the trial structure was similar to that of Experiment 1, except that sample stimuli set was modified to trigger perceptual, imagined, and illusory experience of line orientations spanning the entire orientation space. The use of an entirely distinct stimulus set allowed us to directly examine whether the reverse pattern between EVC and IPS would hold with simple visual features, other than moving objects in Experiment 1. Specifically, participants viewed a centrally displayed sample stimulus for 1 s (a bar with two end discs on perception trials; two distant discs on imagery trials; two distant discs with opposite openings on illusion trials) and then imagined or remembered the corresponding oriented bar over a 9.5 s delay. After that, participants reported the orientation as precisely as possible (Fig. 1C). Behavioral performance of Experiment 2 largely replicated that of Experiment 1: Participants reported an average VVIQ score of 3.96 ± 0.49, and no significant difference between conditions was observed in terms of recall error (F(2,34) = 0.91, p = 0.413; Fig. 1D) or reaction time (F(2,34) = 1.95, p = 0.158; Table 1).
Orientation reconstructions in IPS were superior for imagery compared with perception in Experiment 2
Because all stimuli were centrally presented, in Experiment 2, we focused all neural analyses on bilateral ROIs. To characterize the neural representations of orientations on a continuous space, we implemented multivariate IEMs (Fig. 3A) to reconstruct population-level, orientation representations from voxel-wise brain activity within each ROI (Ester et al., 2015; Yu and Shim, 2017; Yu and Postle, 2021). We first trained a mixed IEM on data from all conditions and then tested on data from each condition at each time point, which allowed us to compare IEM reconstructions between task conditions in an unbiased manner (Sprague et al., 2018). Figure 3B demonstrated example reconstructed orientation responses from selected time points in EVC and IPS, and the slope of each IEM reconstruction was computed to quantify the representational strength of orientation. Similar to Experiment 1, higher slopes indicate higher representational strength. For all conditions, the strength of orientation reconstructions arose markedly following stimulus presentation and remained above baseline over the delay in both EVC (from sample onset on imagery and perception trials, and from 1.5 s on illusion trials, p values < 0.007) and IPS (from 3 s; p values < 0.009; Fig. 3C), indicating sustained representation of sample orientation in both visual and parietal cortex.
Orientation representations in Experiment 2. A, Illustration of the procedure of IEM. The activation of each voxel was characterized by a linear combination of nine idealized channel tuning functions. A training dataset was used to compute the weight matrix of the channels via least-square linear regression. The obtained weight matrix was used to estimate the channel responses for the testing data. The above steps resulted in responses from nine channels. The procedure was repeated 20 times with the centers of nine tuning functions shifted by 1° on each iteration, resulting in channel responses covering the entire stimulus space from 1° to 180°. Finally, all estimated channel responses were circularly shifted to a common center (0° on the x axis) and averaged across trials. To quantify the strength of IEM reconstructions, the averaged channel responses on both sides of the common center were collapsed over and averaged, and the slope of each collapsed reconstruction was then computed through linear regression. B, Example orientation reconstructions using a mixed-IEM from selected time points with peak IEM reconstructions (4.5 s for EVC and 6 s for IPS), in perception, imagery, and illusion conditions, in EVC and IPS. x axis indicates distance from sample orientations, with 0 representing the sample orientation of each trial. y axis indicates reconstructed orientation channel responses in arbitrary units. C, Time course of orientation reconstruction strength in Experiment 2. Colored lines on top indicate significant time points of the corresponding condition. Vertical gray lines indicate onset of delay (at 1 s) and of probe (at 10.5 s). Horizontal lines indicate baseline of 0. Shaded areas represent error bars (±1 SEM). D, Left, Average orientation reconstruction strength in perception, imagery, and illusion conditions, in early epoch of EVC and IPS of Experiment 2. Error bars indicate ±1 SEM. Colored asterisks on top indicate significance of the corresponding condition. Right, Differences in orientation reconstruction strength, between imagery and perception (pink), between illusion and perception (light blue), and between imagery and illusion (purple), in early epoch of EVC and IPS of Experiment 2. Error bars indicate ±1 SEM. Significance of pairwise comparisons between conditions: *p < 0.05; **p < 0.01; ***p < 0.001. E, Same as in D, but with results from late epoch.
Next, similar to Experiment 1, an early and a late epoch were defined to better characterize the representational differences in visual and parietal cortex. During the early epoch, the interaction between ROI and condition was significant (F(2,34) = 10.20, p < 0.001; Fig. 3D). Subsequent pairwise comparisons confirmed the same reversed pattern in EVC and IPS between conditions as in Experiment 1, with EVC demonstrating stronger orientation representations on perception compared with imagery and illusion trials, and IPS demonstrating stronger orientation representations on imagery compared with perception and illusion trials (p values < 0.032). Moreover, orientation representational strength on illusion trials was more comparable to that on imagery trials in EVC (p values > 0.307, BF10 early = 0.243, BF10 late = 0.271), and was more comparable to that on perception trials in IPS (p values > 0.212, BF10 early = 0.287, BF10 late = 0.349) in both epochs, confirming that IPS differentiated between subjectively internal (i.e., imagery) and external (i.e., illusion and perception) experiences. Interestingly, during the late epoch (Fig. 3E), the main effect of condition was significant (F(2,32) = 6.09, p = 0.006), and orientation representations on imagery trials were stronger than those on perception trials in both EVC and IPS (p values < 0.003), and were stronger than those on illusion trials in IPS (p = 0.019). These results indicated that the enhanced imagery representations in IPS persisted into the late epoch, possibly implicating a sustained parietal signal in maintaining internally generated imagery in the task of Experiment 2. Although the comparisons related to the illusion condition were primarily based on the absence of significant decoding differences, our BF analysis provided complementary evidence that these nonsignificant results were more likely to reflect true null effects, rather than being because of insufficient statistical power in the tests.
Enhanced orientation representations of imagery cannot be explained by univariate BOLD differences
Having identified enhanced orientation representations in IPS for imagery, at the multivariate level in both Experiments 1 and 2, we asked whether this result could be accounted for by differences in univariate BOLD activity. In Experiment 1, neither epoch demonstrated a significant interaction effect in BOLD activity (F values < 0.62, p values > 0.652), and sample presentation evoked stronger BOLD activity on perception and illusion compared with imagery trials in both ROIs (t values > 2.59, p values < 0.015; Fig. 4A–C). In Experiment 2, both epochs demonstrated significant interaction effects (F values > 4.40, p values < 0.021). Nevertheless, there was no difference between three conditions (t values < 1.85, p values > 0.227) but only a slightly higher response on imagery and illusion trials in IPS in early epoch (t values > 2.58, p values < 0.029; Fig. 4D–F). In other words, multivariate and univariate analyses exhibited distinct patterns of condition differences, implicating that our decoding results were unlikely to be explained by univariate BOLD activation differences between conditions.
Univariate BOLD activity. A, Time course of BOLD signal change in perception (blue), imagery (red), and illusion (green) conditions, in EVC and IPS of Experiment 1. ROIs were contralateral to the side of stimulus presentation. Colored lines on top indicate significant time points of the corresponding condition. Vertical gray lines indicate onset of delay (at 2 s) and of probe (at 10.5 s). Horizontal lines indicate baseline of 0. Shaded areas represent error bars (±1 SEM). B, Average BOLD signal change in perception, imagery, and illusion conditions, in early (left) and late (right) epochs of EVC and IPS of Experiment 1. Error bars indicate ±1 SEM. Colored asterisks on top indicate significance of the corresponding condition. C, Differences in BOLD signal change, between imagery and perception (pink), between illusion and perception (light blue), and between imagery and illusion (purple), in early (left) and late (right) epochs of EVC and IPS of Experiment 1. Error bars indicate ±1 SEM. Significance of pairwise comparisons between conditions: *p < 0.05; **p < 0.01; ***p < 0.001. D, Similar to A, but with results from Experiment 2. Vertical gray lines indicate onset of delay (at 1 s) and of probe (at 10.5 s). E, Similar to B, but with results from Experiment 2. F, Similar to C, but with results from Experiment 2.
This was further validated by a series of control analyses: in Experiment 1, when we removed the mean BOLD activity from each condition and repeated the decoding analysis, similar patterns of results remained (Fig. 5A,B). Likewise, controlling for the mean BOLD difference between left and right orientations within condition did not change the patterns of results either (Fig. 5C,D), suggesting that the decoding differences were exclusively linked to neural representations encoded in multivariate activation patterns. In Experiment 2, again when we removed the mean BOLD activity between conditions (Fig. 5E,F), the results largely remained (p values < 0.036, except for early epoch of IPS: imagery vs perception, p = 0.092, imagery vs illusion, p = 0.346). Hence, again the differences in representational strength across conditions could not simply be explained by differences in mean BOLD activity.
Orientation representational strength in Experiments 1 and 2 after controlling for differences in univariate BOLD activity. A, Left, Average decoding accuracy in perception (blue), imagery (red), and illusion (green) conditions, in early epoch of EVC and IPS, after removing mean differences in BOLD activity between conditions. ROIs were contralateral to the side of stimulus presentation. Horizontal dashed lines indicate chance level of 0.5. Error bars indicate ±1 SEM. Colored asterisks on top indicate significance of the corresponding condition. Right, Differences in decoding accuracy, between imagery and perception (pink), between illusion and perception (light blue), and between imagery and illusion (purple), in early epoch of EVC and IPS of Experiment 1. Error bars indicate ±1 SEM. Significance of pairwise comparisons between conditions: *p < 0.05; **p < 0.01; ***p < 0.001. B, Same as in A, but with results from late epoch. C. Same as in A, but with mean differences in BOLD activity between two path orientations within each condition removed. D, Same as in B, but with mean differences in BOLD activity between two path orientations within each condition removed. E, Similar to A, but with orientation reconstruction strength from Experiment 2 in early epoch. F, Similar to E, but with results from late epoch.
Enhanced orientation representations of imagery cannot be explained by differences in eye movement patterns
In Experiment 2, sample stimuli on imagery trials consisted of two distant discs without a line presented at the fovea. This raised the possibility that participants might have had more eye movements toward the distant discs along the target orientation, thereby resulting in better neural representations of orientations on imagery trials. This was unlikely, given that better orientation representation was not observed in illusion, which should have also been influenced by eye movements, if existed. Nevertheless, to rule out this possibility, we conducted a behavioral control experiment during which we recruited a new group of volunteers (n = 13) to perform the same task as Experiment 2 with a shorter delay of 2.4 s. Our results showed that decoding of orientations using eye position data revealed no significant differences between conditions (F(2,24) = 1.06, p = 0.362; Fig. 1E), suggesting that our neural IEM results were unlikely to be contaminated by differences in stimulus-evoked eye movements.
Enhanced orientation representations of imagery remained with a retrocue design
In both experiments, only one sample was presented and the sample differed in appearance between conditions, thereby resulting in inherently differential sample-evoked activity. Although we have demonstrated, through a series of univariate and multivariate analyses across ROIs, that our results were unlikely explained by physical differences in samples, we nevertheless performed a tentative study with a small group of participants (n = 5): the experimental procedure was identical to that in Experiment 2, except that a retrocue paradigm was used. Participants viewed two sequentially presented samples at the beginning of each trial, and were retrocued on the to-be-maintained sample shortly after that. This retrocue design ensured that participants performed internal selection on the to-be-maintained orientation, rather than simply relying on stimulus differences between conditions. While this type of design significantly reduced stimulus-driven effects in EVC, the difference between imagery and perception remained in IPS; that is, orientation representation in imagery remained stronger than that in perception in IPS in both epochs (p values < 0.001, Fig. 6). These results again highlighted the crucial role of IPS in generating stimulus-specific representations in imagery.
Orientation reconstructions in the retrocue version of Experiment 2 (n = 5). A, Time course of orientation reconstruction strength using a mixed-IEM. Colored lines on top indicate significant time points of the corresponding condition. Vertical gray lines indicate onset of delay (at 3 s) and of probe (at 12 s). Horizontal lines indicate baseline of 0. Shaded areas represent error bars (±1 SEM). B, Average orientation reconstruction strength in perception, imagery, and illusion conditions, in early (left) and late (right) epochs of EVC and IPS. Error bars indicate ±1 SEM. Colored asterisks on top indicate significance of the corresponding condition. C, Differences in orientation reconstruction strength, between imagery and perception (pink), between illusion and perception (light blue), and between imagery and illusion (purple), in early (left) and late (right) epochs of EVC and IPS. Error bars indicate ±1 SEM. Significance of pairwise comparisons between conditions: *p < 0.05; **p < 0.01; ***p < 0.001.
The representational format of orientation representations
In addition to decoding performance at each time point, we further investigated the time-varying population dynamics in each condition, by examining the temporal generalization decoding patterns within each condition. In Experiment 1, we found that, in EVC, neural codes in imagery and illusion were stable over time, while neural codes in perception were more dynamic, and underwent a clear transition from sample to delay period. In IPS, a similar transition was observed for imagery but not for perception or illusion (Fig. 7A). These changes in population dynamics may constitute another neural signature that underlies the generation of imagery signals. In Experiment 2, however, neural codes in all three conditions remained stable over time, even in the perception condition (Fig. 7C; this point will be elaborated in Discussion).
Temporal generalization and cross-condition results. A, Temporal generalization results in perception, imagery, and illusion conditions, in EVC and IPS of Experiment 1. ROIs were contralateral to the side of stimulus presentation. x and y axes indicate test and train time, respectively. White dashed lines indicate clusters with significant orientation representations after correction. B, Average decoding accuracy in cross-condition decoding of Experiment 1 when classifier was trained on one of the three conditions, and tested on all conditions, in EVC and IPS, for early epoch and late epoch separately. Colored asterisks on top indicate significance of decoding performance of the corresponding test condition: *p < 0.05; **p < 0.01; ***p < 0.001. Horizontal dashed lines indicate chance level of 0.5. Error bars indicate ±1 SEM. C, Similar to A, but with results from Experiment 2. D, Similar to B, but with average orientation reconstruction strength in cross-condition IEM from Experiment 2.
In addition, to compare the representational format of orientations between conditions, we conducted a cross-condition decoding analysis, during which the classifier was trained on data from one condition and tested on the other two conditions separately for each epoch. We found that all three conditions were mutually generalizable in all ROIs during both epochs (p values < 0.018), in both Experiment 1 (Fig. 7B) and Experiment 2 (Fig. 7D). This suggested perception, imagery, and illusion shared common stimulus-specific representations in both EVC and higher-order IPS, although the relative representational strength differed between conditions within each ROI.
In Experiment 2, we collected data from an additional perception-mapping task on orientations, which allowed us to train a fixed model using the perception-mapping data and make comparisons between conditions and epochs based on their neural similarities to perception. To be specific, we trained another IEM on data from the independent orientation perception-mapping task, and tested it on data from each of the three conditions. Overall, results from this perception-mapping IEM were reminiscent of those from the mixed IEM: orientation representations were significant throughout the trial in EVC (p values < 0.001) and IPS (p values < 0.034), suggesting shared representations between internally generated and externally stimulated signals, and also between maintained and perceived signals (Fig. 8A,B). Patterns of representational differences also largely replicated the main findings (p values < 0.049, except for early epoch of IPS: imagery vs illusion, p = 0.219; Fig. 8C,D). Stronger orientation representation on imagery trials was still observed in IPS when using the perception-mapping IEM. Together, these results further confirmed that IPS was more involved in imagery compared with perception.
Orientation representations in Experiment 2 with a perception-mapping IEM. A, Example orientation reconstructions using a perception-mapping IEM from selected time points with peak IEM reconstructions (4.5 s for EVC and 6 s for IPS for main task, 4.5-6 s for perception mapping), in perception, imagery, and illusion conditions, in EVC and IPS. x axis indicates distance from sample orientations, with 0 representing the sample orientation of each trial. y axis indicates reconstructed orientation channel responses in arbitrary units. B, Time course of orientation reconstruction strength in Experiment 2. Colored lines on top indicate significant time points of the corresponding condition. Vertical gray lines indicate onset of delay (at 1 s) and of probe (at 10.5 s). Horizontal lines indicate baseline of 0. Shaded areas represent error bars (±1 SEM). C, Left, Average orientation reconstruction strength in perception, imagery, and illusion conditions, in early epoch of EVC and IPS of Experiment 2. Error bars indicate ±1 SEM. Colored asterisks on top indicate significance of the corresponding condition. Right, Differences in orientation reconstruction strength, between imagery and perception (pink), between illusion and perception (light blue), and between imagery and illusion (purple), in early epoch of EVC and IPS of Experiment 2. Error bars indicate ±1 SEM. Significance of pairwise comparisons between conditions: *p < 0.05; **p < 0.01; ***p < 0.001. D, Same as in C, but with results from late epoch.
Relative representational strength changed along the retinotopic hierarchy. A, Differences in orientation representational strength between imagery and perception in retinotopic ROIs (V1-V3, V3AB, IPS0-5) along the EVC-IPS retinotopic hierarchy. Top row, Orientation decoding results from Experiment 1. Bottom row, Orientation reconstruction results from Experiment 2. Significant difference in the corresponding ROI: *p < 0.05; **p < 0.01; ***p < 0.001. All p values remain uncorrected. Error bars indicate ±1 SEM. B, Similar to A, but with results extended into the ventral visual stream (V1-V3, hV4, VO1, VO2).
Emergence of imagery dominance along the posterior-to-anterior cortical hierarchy
We have provided converging evidence that IPS held more pronounced orientation representations during imagery compared with perception and illusion. How did this imagery dominance emerge along the posterior-to-anterior cortical hierarchy, while this observation was absent in sensory-driven EVC? To address this question, we defined a set of functionally activated, retinotopic ROIs from EVC to IPS, including V1, V2, V3, V3AB, IPS0, IPS1, IPS2, IPS3, IPS4, and IPS5. In each retinotopic ROI, we calculated a difference index, between decoding accuracies of imagery and perception for Experiment 1, and between reconstruction strength of the two for Experiment 2. As depicted in Figure 9A, imagery dominance developed gradually along the EVC-IPS hierarchy in the early epoch in both experiments. Critically, an equilibrium point between imagery and perception was reached in V3AB, which lies in between EVC and IPS. In the late epoch of both experiments, enhanced orientation representations were observed during imagery in subregions of IPS, as well as in V3AB and/or EVC, suggesting an overall sustained enhancement of orientation representations in these regions during the maintenance of imagery information.
Only parietal cortex demonstrated a domain-general role in imagery
We have demonstrated superior orientation representations in imagery in parietal cortex across distinct stimuli sets (moving objects vs visual feature), implicating a domain-general role of parietal cortex in imagery. How specific was this effect to parietal cortex? To address this question, we performed two additional analyses: first, we repeated the analysis in a series of ROIs along the ventral visual stream (hV4, VO1, VO2), and found that enhanced representations for imagery were present in Experiment 1 where objects were used as stimuli, but were absent in Experiment 2 where only simple visual features were imagined (Fig. 9B). Likewise, because moving objects were used in Experiment 1, we analyzed results from the motion-selective MT+ in both experiments. Similar to what we observed in the ventral visual stream, enhanced representations for imagery were found in MT+ in Experiment 1 (p < 0.001, Fig. 10A) but not in Experiment 2 (p = 0.365, Fig. 10B). These results together confirmed that object-selective and motion-selective visual areas exhibited a domain-specific role in imagery, as opposed to the domain-general role of parietal cortex.
Orientation representations in MT+ and sPCS in Experiments 1 and 2. A, Left, Time course of decoding accuracy in perception (blue), imagery (red), and illusion (green) conditions, in MT+ of Experiment 1. Colored lines on top indicate significant time points of the corresponding condition. Vertical gray lines indicate onset of delay (at 2 s) and of probe (at 10.5 s). Horizontal lines indicate chance level of 0.5. Shaded areas represent error bars (±1 SEM). Middle, Average decoding accuracy in perception, imagery, and illusion conditions. Horizontal dashed lines indicate chance level of 0.5. Error bars indicate ±1 SEM. Colored asterisks on top indicate significance of the corresponding condition. Right, Differences in decoding accuracy, between imagery and perception (pink), between illusion and perception (light blue), and between imagery and illusion (purple). Error bars indicate ±1 SEM. Significance of pairwise comparisons between conditions: *p < 0.05; **p < 0.01; ***p < 0.001. B, Similar to A, but with results of orientation reconstruction strength using a mixed-IEM in MT+ of Experiment 2. Left, Vertical gray lines indicate onset of delay (at 1 s) and of probe (at 10.5 s). Horizontal lines indicate baseline of 0. C, Similar to A, but with results of sPCS in Experiment 1. D, Similar to C, but with results of sPCS in Experiment 2.
Second, we examined an additional ROI that lies more anterior to parietal cortex (i.e., the sPCS in frontal cortex) (Sprague and Serences, 2013; Yu and Shim, 2017), to see whether the observed patterns were widespread across multiple higher-order brain regions. We found that sPCS failed to demonstrate an analogous pattern in either Experiment 1 (p values > 0.148, Fig. 10C) or Experiment 2 (p values > 0.249 except for imagery vs illusion in early epoch: p = 0.040; Fig. 10D). To summarize, the absence of differences in the strength of stimulus representations in brain regions higher in the cortical hierarchy than IPS, together with the positive evidence in IPS, implied a unique role of parietal cortex in forming stimulus-specific imagery.
Discussion
Previous neuroimaging work has demonstrated shared neural codes between visual imagery and perception in EVC (Albers et al., 2013; Ragni et al., 2020). However, it had remained less clear how stimulus-specific representations of imagery and perception differed from each other. Here we demonstrated superior stimulus-specific representations of perception compared with those of imagery in EVC, consistent with the role of EVC in processing bottom-up external signals. By contrast, stimulus-specific representations of imagery were superior to those of perception in IPS. With the same logic, we interpreted this difference as reflecting a potential role of IPS in generating stimulus-specific imagery signals. We further demonstrated that this imagery dominance gradually developed along a functional gradient from EVC to IPS, with V3AB as an equilibrium point in-between. Moreover, these results cannot be explained by differences in univariate BOLD activations between conditions, nor by influences from eye movements. Additionally, using a separate perception-mapping task to reconstruct orientation representations, and repeating Experiment 2 using a retrocue design, both yielded qualitatively similar results. Last, object/motion-selective regions demonstrated a domain-specific role as opposed to IPS, and examinations on a more anterior frontal region, sPCS, failed to replicate findings in IPS. To summarize, through a series of rigorous control experiments and analyses, we validated our findings of a unique imagery dominance in IPS.
Top-down modulation during imagery from parietal cortex has been supported by diverse neuroimaging evidence: in which parietal cortex shows consistent activation in imagery (Ishai et al., 2000; Ganis et al., 2004), content-independent connectivity with occipitotemporal areas (Mechelli et al., 2004), and correlation between BOLD activity and behavioral performance (Ragni et al., 2020). Nevertheless, the exact information being relayed from parietal cortex remains unclear. In particular, it has remained unclear how imagery representations in parietal cortex differ from perception (Albers et al., 2013; Ragni et al., 2020). Here, leveraging multivariate classification and IEMs, we provided converging evidence that imagery yielded significantly enhanced stimulus-specific representations in IPS compared with perception. The observation of robust imagery representations in IPS is in line with recent working memory studies demonstrating robust representations of memory contents in IPS (Ester et al., 2015; Yu and Shim, 2017; Rademaker et al., 2019; Yu and Postle, 2021), and provides new insights into the nature of such representations: first, stimulus-specific representations in IPS were not epiphenomenal, but rather functionally relevant and varied in strength with regard to the internality of information; second, enhanced representations of imagery emerged early in the trial, suggesting an internal origin of the representations from IPS. Third, stimulus-specific representations in IPS, regardless of its exact format, were fine-tuned enough to allow successful reconstructions from a large set of orientations. Yet, it is noteworthy that top-down signals are involved in both imagery and perception, although stronger top-down modulation from parietal cortex was observed in the current study.
Unlike previous work, the current study included a delay period in all conditions. The memory delay allowed better dissociation between stimulus-evoked and memory-related signals over time. With this design, we were able to localize the representational differences to the sample epoch in both experiments. Moreover, similarly enhanced representations of imagery extended into the late delay epoch, suggesting that IPS is involved in both the generation and maintenance of imagery contents. Although this imagery dominance during delay was relatively weaker and was only present in subregions of IPS in Experiment 1, it should be noted that a lack of imagery dominance during delay does not deny the contribution of IPS to imagery maintenance. Rather, we interpret the prolonged effect in Experiment 2 as reflecting sustained efforts of IPS in maintaining imagery, possibly because of increased task difficulty caused by the large number of orientations used. Another advantage of including a delay was it allowed examination on the time-varying population dynamics of each condition (Spaak et al., 2017). Intriguingly, we observed a stable neural code across sample and delay periods in imagery and illusion in both experiments and in perception in Experiment 2, but a highly dynamic neural code in perception in Experiment 1. Perhaps relatedly, in Experiment 2, imagery and perception shared neural representations even in IPS, which was not the case between working memory and perception in several previous studies (Rademaker et al., 2019; Yu and Shim, 2019). We speculate that this apparent difference could be because of stimulus differences: we used more simplified stimuli (oriented lines) rather than gratings as in previous studies. The simplified stimuli might facilitate neural similarities in representing imagined and perceived contents (Kwak and Curtis, 2022), thereby resulting in better cross-condition and cross-time generalization. When using more complex stimuli like those in Experiment 1, we observed a clear transition in neural code from sample to delay period in perception but not in imagery or illusion. Given that sample stimuli differed greatly in appearance in Experiment 1, it could be that the neural code during sample period reflected pure sensory-driven signals, and assimilated into a common neural code with imagery and illusion during delay. This account would be consistent with previous work demonstrating an absence of cross-decoding between physical and illusory stimuli in EVC when no memory was required (Liu et al., 2019). This difference in temporal dynamics might reflect another important neural signature that distinguishes imagery from perception.
Illusion can be conceptualized as misperception or involuntary imagery (Pearson and Westbrook, 2015; Pearson, 2019) within different theoretical frameworks. On one hand, illusion causes similar subjective experience as veridical perception that “information being perceived externally”; on the other hand, both illusory and imagined experience arises when the information is not directly accessible externally, yet illusory experience lacks the subjective feeling of “information being generated internally.” As such, illusion could serve as an ideal tool to dissociate subjective and objective internality. We found that, in EVC, illusion acted in a more comparable manner to imagery in terms of representational strength, reflecting a major contribution of physical sensory inputs regardless of subjective experience. In IPS, by contrast, the representational strength of orientation during illusion resembled that during perception to a larger extent, reflecting a dissociation between subjectively external and internal experiences. We argued that this result was unlikely because of an overall reduction in the strength of orientation representations during illusion, because representational strength in illusion did not significantly differ from that in imagery in EVC, nor from that in perception in IPS. In Experiment 2, representational strength in illusion was even numerically (but not statistically) higher in magnitude than in perception in several analyses. Collectively, these findings provided further support for our hypothesis that IPS carries neural signals that discriminate between subjectively internal and external experiences, which could be a critical reason of why imagery and perception feel so different. Meanwhile, although we primarily focused on the role of parietal cortex in differentiating between subjectively internal and external experiences, we would like to note that our current results are also consistent with previous work demonstrating a role of EVC in representing subjective sensory strength, because illusion is typically perceived as weaker (although real) compared with veridical perception. Indeed, EVC and IPS may play a joint role in this process, and determine subjective experience in a collaborative manner.
Our results are broadly in line with the reverse hierarchy hypothesis, which proposes that imagery involves a reverse cortical hierarchy as opposed to perception, wherein signals originate from high-level frontoparietal and visual cortex and backpropagate to lower-level visual areas (Hochstein and Ahissar, 2002; Pearson, 2019). In addition, our findings add to the theory in several aspects: first, most of previous evidence in support of the reverse hierarchy focused on representational differences between low-level and high-level visual cortex along the ventral visual stream (Lee et al., 2012; Horikawa and Kamitani, 2017; Dijkstra et al., 2018, 2020); critically, representational strength of imagery never exceeded that of perception within the same brain region in those studies. Here, by having participants imagine or perceive orientations, we attempted to minimize decoding difficulties between brain regions, thereby providing the first evidence for a representational “flip” between imagery and perception in parietal cortex. Second, we also observed enhanced imagery representations in object-selective visual cortex and motion-selective MT+ (Kaas et al., 2010) when using complex moving Gabors in Experiment 1. By contrast, in Experiment 2 when using simple features, the object/motion-related hierarchies disappeared, while the EVC-and-IPS hierarchy remained. These results suggested that the source region of domain-specific imagery could be distributed in multiple areas and located flexibly in the cortical hierarchy, depending on the type of imagery contents. On the contrary, parietal cortex is involved in not only imagery of spatial information (Sack et al., 2005; Winlove et al., 2018), but also imagery of nonspatial stimuli, such as features (Yu and Postle, 2021), objects (Dijkstra et al., 2017; Ragni et al., 2020, 2021), and pictures (Breedlove et al., 2020). Therefore, parietal cortex may play a more domain-general role in visual imagery of various types of information. Whether the imagery dominance in parietal cortex generalizes to more types of information remains to be further elucidated in future research.
Footnotes
This work was supported by Ministry of Science and Technology of China STI2030-Major Projects 2021ZD0204202 and 2021ZD0203701; National Natural Science Foundation of China 32271089; Shanghai Pujiang Program 22PJ1414400; CAS Project for Young Scientists in Basic Research YSBR-071; and Shanghai Municipal Science and Technology Major Project 2018SHZDZX05. We thank Dr. Patrick Cavanagh for valuable comments on the manuscript; and Yiheng Hu for sharing part of the analysis codes.
The authors declare no competing financial interests.
- Correspondence should be addressed to Qing Yu at qingyu{at}ion.ac.cn