Abstract
How is spatial attention deployed in mental images? Mental imagery is often assumed to share mechanisms with visual perception and visual working memory. Top-down, endogenous spatial attention in both visual perception and working memory modulates behavior and parieto-occipital alpha-band activity. However, working memory captures only a subset of mental imagery, which can also draw upon long-term memory. Here, we ask whether and how spatial attention operates in mental images derived from general knowledge in long-term memory and whether it recruits the same neural mechanisms as visual perception. We recorded EEG in 28 healthy volunteers (13 males, 15 females) as they performed two discrimination tasks with spatial cues (70% valid): one involving the mental visualization of a long-term memory map (a map of France) and the other using visual stimuli. We show that spatial attention shortens response times in both tasks, but through distinct mechanisms. Behavioral attentional benefits were uncorrelated across tasks, and spatial attention in mental imagery engaged distinct neural mechanisms, with frontal rather than posterior alpha activity modulation. We further reveal fundamental differences in the spatial structures of mental imagery and visual perception. Altogether, our results show that mental images drawn from long-term semantic memory are spatially organized and are amenable to spatial attention deployment, but the underlying neural mechanisms differ from those of visual perception. Our results thus point to marked differences between mental imagery from long-term memory and visual perception.
Significance Statement
Do we orient attention in the mind’s eye as we do in visual perception? For mental images held in working memory, short-lived and percept-like, this seems to be the case. Yet many forms of imagery, such as imagining a map, draw on long-term memory. Our study reveals that spatial attention can also operate within mental images based on long-term memory, but through neural mechanisms distinct from visual perception. Additionally, we show that while these images possess a spatial organization, it does not replicate the spatial format of visual percepts. These findings challenge the assumption that mental imagery simply reuses perceptual processes.
Introduction
How do we orient spatial attention within mental images? In perception, voluntarily orienting spatial attention modulates performance in visual tasks (Posner, 1980; Rizzolatti et al., 1987). This top-down process is accompanied by characteristic changes in parieto-occipital alpha-band activity, whose topography indicates the locus of attention (Worden et al., 2000; Thut et al., 2006; Gould et al., 2011). Similar behavioral and electrophysiological effects have also been reported when attention is oriented within working memory (Griffin and Nobre, 2003; Nobre et al., 2007; Rösner et al., 2020; Sutterer et al., 2021). Mental imagery shares properties and mechanisms of both perception and visual short-term memory (Tong, 2013; Dijkstra et al., 2019; Pearson, 2019). It follows that spatial attention orientation might be similar in perception, short-term memory and mental imagery. However, mental imagery extends beyond short-term visual maintenance, engaging other memory systems (Dijkstra et al., 2019; Spagna et al., 2024), and potentially relying on formats distinct from working memory and perception. Here, we ask whether and how spatial attention operates within mental images derived from general knowledge, a.k.a. semantic memory and whether the underlying mechanisms resemble those of visual perception.
Spatial attention in mental imagery might depend on the spatial format of mental images, a topic that has been a subject of debate for decades. Some argue that mental images have a spatial format analogous to visual perception (Kosslyn, 1975; Pearson et al., 2015; Pearson and Kosslyn, 2015; Pearson, 2019), supported by experimental evidence that the spatial structure of internal representations influences response time during navigational decision-making (Fernandez Velasco et al., 2025). In mental scanning tasks, the time required to scan a mental image correlates with the Euclidean distances it represents, whether in working memory (Kosslyn et al., 1978; Finke and Pinker, 1982) or based on verbal descriptions (Denis et al., 1995). These format similarities are also observed in the neural architecture of imagery, which frequently overlaps with perceptual circuits and preserves spatial topography (Kosslyn et al., 1993; Thirion et al., 2006; Naselaris et al., 2015; Dijkstra et al., 2017, 2019; Senden et al., 2019). Other authors caution against assuming equivalence between imagery and perception, highlighting fundamental differences in their mechanisms (Pylyshyn, 2003; Spagna et al., 2021, 2024). Crucially, commonalities between imagery and perception may depend on the type of mental image. Mental images associated with working memory are often detailed and anchored to recent perceptual input (Dijkstra, 2024), whereas images retrieved from long-term memory may be more flexible and incomplete (Bigelow et al., 2023), involving nonspatial, semantically encoded information. For these, the degree of overlap with perception remains unclear. In this study, we asked participants to draw on their long-term memory to imagine the map of France, an image with inherently spatial structure (Bourlon et al., 2011; Guariglia et al., 2013; Boccia et al., 2015). We could thus investigate the spatial properties of such a mental image, as well as its analogy to visual perception.
We designed two spatial attention tasks presented in alternating blocks: one based on mental imagery and the other on visual perception (Fig. 1A). In the Imagery task, participants imagined a map of France and were instructed with a spatial cue to direct their attention to one side (east or west). They then chose, from two city names presented on screen, the one closest to Paris. In the Perception task, participants viewed the same spatial attentional cues but were presented with two dots and had to choose which of the two was closer to fixation. Dot positions were derived from city locations and dynamically adjusted to match difficulty between tasks. We tested whether valid cues shortened reaction times in Imagery, as in Perception, and whether attentional orienting modulated alpha-band activity in the EEG (8–12 Hz). The design further allowed us to assess the spatial organization of mental imagery compared with perception. More precisely, we evaluated how task-relevant (distance from Paris/fixation) and irrelevant spatial parameters (eccentricity, separation) influenced response times (Fig. 1B).
Experimental paradigm (block design). A, Time course of a trial in the Mental Imagery (top) and Visual Perception (bottom) tasks. In the Imagery task, participants were instructed to imagine a map of France at the start of each trial. They were then presented with a cue indicating whether they should mentally direct their attention to the east or west side of the map. After a 1.2 s delay, they were asked to choose as accurately as possible from the two French cities displayed (both located either west or east of Paris), which one was closest to Paris by pressing the corresponding up or down key. The northernmost city was always displayed above the fixation point. Seventy percent of the cues were valid, i.e., correctly indicated the probed side of the map. The time course of the trials in the Perception task was similar to that of the Imagery task, but participants were instructed to orient attention to the left or right side of the screen. At the end of each trial, they indicated which of the two dots displayed was closest to the fixation point by pressing the up (orange) or down (blue) key. The difficulty of each block in the Perception task was adjusted to match the accuracy of the previous Imagery block, by increasing the distance between the closest dot and the fixation point. We recorded response times, accuracy, and the electroencephalogram. Alpha power was computed during the attentional orienting delay and compared between left and right cued trials. B, Spatial parameters. Both tasks involved spatial discrimination based on the difference in distances from Paris (Imagery) or from the fixation point (Perception), referred to as task-relevant distance. Other spatial parameters are irrelevant to the task but may still influence participants' decisions, such as eccentricity and separation. Here, eccentricity is defined as the sum of distances from Paris in Imagery and from fixation in Perception. Separation refers to the distance between the two cities in Imagery and between the two dots in Perception.
Materials and Methods
Participants
Thirty-four healthy volunteers completed the experiment after providing a written informed consent. Because of the nature of the task, we preferentially recruited geography students (all but one participant). All participants were native French speakers recruited from a French university. They were right-handed; had normal or corrected-to-normal vision; had no history of neurological, psychiatric, cardiological, respiratory, or digestive disease; and had no medication or illegal substance use on a regular basis (self-report). Subjects received financial compensation of €45 for a maximal total duration of 3 h. The experimental task itself lasted ∼1 h. The remaining time included instructions, electrode placement, questionnaires, and breaks. The initial sample size of 34 was selected based on common practice in EEG studies investigating spatial attention in visual perception and visual working memory (Worden et al., 2000; Thut et al., 2006; Myers et al., 2015; Rösner et al., 2020) and included a conservative margin to account for data loss due to artifacts. Six participants were excluded from the analysis, one because of a technical error at acquisition, one because of exceedingly long reaction times (larger than three standard deviations of the group mean), and four due to a number of remaining trials per task smaller than 60 after EEG and eye-tracking data preprocessing. As a result, 28 participants were selected for analysis [13 males, 15 females, mean age (SD) = 21.07 (1.44), range 18–25 years old]. Procedures were approved by INSERM ethics committee (IRB00003888—Avis 18-544-ter; 25.10.2021).
Stimuli selection
Eighty-five French cities and their GPS coordinates were selected among the main urban areas or main touristic site. From this list, two sets of city pairs were created: one in which both cities were located east of Paris and one in which both cities were located west of Paris. Pairs were excluded if the two cities were <15 km apart or if their respective distances from Paris differed by <15 km.
City pairs were then selected individually for each participant. At the beginning of the experiment, participants rated their confidence in locating each city on a map of France, using a discrete digital scale from 0 (no idea) to 5 (precise knowledge of the location). Importantly, no visual support was provided during this rating task, as our goal was for participants to rely on their general knowledge of the map during the Imagery task, rather than on visual stimuli recently encoded in short-term memory. No recordings were made during this phase apart from the confidence ratings; response times were not collected, as this step was not designed for analysis but solely to personalize the stimulus material based on participants' subjective knowledge. Across all participants and cities, the mean confidence rating was 3.55 (SD = 0.66), indicating overall good familiarity with the cities. A total of 105 pairs were then selected for each side (east and west) for each participant. Selection prioritized two criteria in hierarchical order: first, maximizing the balance of the correct answer’s relative position (north vs south) and second, maximizing the participant’s familiarity with the selected cities (i.e., highest confidence scores). Each pair was unique, though individual cities could appear in multiple pairs (mean = 15.12 pairs, SD = 10.46). This process ensured that the task was both participant-specific and optimized for spatial discrimination.
Tasks
In the Mental Imagery task (Fig. 1A), participants were asked to imagine a map of France, centered on Paris, while maintaining their gaze on the fixation symbol at the center of the screen. Although Paris is the mental reference point in the imagined map, participants were explicitly instructed to keep their gaze fixed on the central fixation symbol throughout the task, thereby ensuring that the reference frame for eye position was identical across the Imagery and Perception tasks. After a delay of 1 to 1.3 s, they were presented for 0.4 s with the attentional cue (thickening of the right or left side of the fixation symbol, with equal probability). They were instructed to orient their attention mentally to the cued side of their mental image of the map, from the moment the cue appeared on the screen. After another delay of 1.2 s, two French cities were presented above and below the fixation point. Participants were asked to indicate the city they considered to be closest to Paris, by pressing the up or down key on the keyboard. To minimize potential confounds related to spatial compatibility effects (Simon and Wolf, 1963), the northernmost city of each pair was always displayed at the top of the screen and the southernmost at the bottom. This consistent vertical mapping aimed to reduce interference between the spatial position of the stimuli and the required response, ensuring that performance reflected the intended experimental manipulation rather than response biases linked to spatial congruency. The screen position of the correct answer was counterbalanced as much as possible within each participant during the creation of city pairs, as described above in the Stimuli selection section. Participants were encouraged to answer as accurately as possible, with a maximal response time of 6 s. Their response was followed by an intertrial interval between 1.4 and 2.2 s.
In the Visual Perception task (Fig. 1A), participants were instructed to fixate the central point on the screen for 1–1.3 s and then to covertly orient their attention to the cued side of the screen upon the presentation of the attentional cue (same cue duration as in the Mental Imagery task). After a delay of 1.2 s, two dots were presented on the screen. The uppermost point was colored orange and the lowermost was colored blue. Participants were instructed to indicate which of the two dots they perceived as being closest to the fixation point, by pressing the corresponding key on the keyboard (up arrow for orange, down arrow for blue). This fixed color–response mapping served to disambiguate the response options and ensured that the task focused solely on judging proximity to fixation, without requiring participants to explicitly discriminate vertical position (i.e., upper vs lower). The position of the dots on each trial was derived from the city locations used in the Imagery task, with one dot’s position dynamically adjusted to match difficulty across tasks (see below, Dot locations and difficulty adjustment in the Perception task, for details). Consequently, the counterbalancing of the correct answer (i.e., the dot closest to fixation) mirrored that of the Imagery task.
Participants completed three blocks of each task. Each block lasted ∼10 min and consisted of 70 trials with 30% of the trials involving an invalid cue. More precisely, each block included 35 East/Right trials (24 valid + 11 invalid or 25 valid + 10 invalid, interleaved) and 35 West/Left trials (24 valid + 11 invalid or 25 valid + 10 invalid, interleaved). This resulted in a total of 147 valid and 63 invalid trials per task and per participant. The blocks were presented in alternation, starting with the Imagery task. Two different fixation symbols (a square and a circle) were used, with one symbol assigned to each task. This assignment was counterbalanced across participants. The use of distinct fixation symbols served as a visual cue to indicate which task (Imagery or Perception) was currently being performed, to reduce potential task confusion and to facilitate engagement in mental imagery during the Imagery task. The two fixation symbols (square or circle) were pseudorandomly assigned to each task prior to the experiment, counterbalanced across participants. At the end of each block, participants were provided with feedback on their accuracy.
The choice of timings (ITI, warning, cue, postcue delay) was guided by the range commonly used in top-down, endogenous spatial attentional cueing paradigms from both the visual perception (Worden et al., 2000; Sauseng et al., 2005; Thut et al., 2006; Wyart and Tallon-Baudry, 2009; Gould et al., 2011; Wyart et al., 2012) and visual working memory (Ikkai et al., 2016; Rösner et al., 2020) literature, with cue presentations of a few hundred of milliseconds and a delay for attentional orienting of about 1 s.
The tasks were programmed in Python, using PsychoPy v2021.2.3 (Peirce et al., 2019), and were presented on a gray 1,920 × 1,080 pixels screen, at a 70 cm viewing distance. Participants were instructed to blink only during the intertrial intervals.
Dot locations and difficulty adjustment in the Perception task
For each participant, visual dot locations in the Perception task were derived from the city locations used in the Imagery task and then adjusted to ensure comparable task difficulty, as detailed below. Initial dot positions were computed from the GPS coordinates of city pairs. Specifically, dots were placed at the screen locations where the cities would appear if a 500-pixel-diameter map of France were overlaid on the screen, with Paris centered at fixation. Those initial locations were then modulated to match accuracy in the Perception task to accuracy in the Imagery task.
In practice, for the first five trials of each Perception block, difficulty was fixed: the dot closest to fixation was moved such that the task-relevant distance was 25 pixels. Starting from the sixth trial in the Perception task, the distance from fixation of one of the two dots was then dynamically adjusted on a trial-by-trial basis. Accuracy in the Perception task over a moving window of up to 10 previous trials was computed. If this accuracy fell below the target accuracy (defined as the overall accuracy in the previous Imagery block), the distance of the closest dot from fixation was reduced so that the task-relevant distance was increased by 15% (decreasing difficulty). If accuracy was above the target, the closest dot was shifted further from fixation, so that the task-relevant distance decreased by 10% (increasing difficulty). This adaptive procedure successfully equalized task performance: accuracy rates did not significantly differ between tasks, with moderate Bayesian evidence supporting an absence of difference [mean HR (SD), Perception: 81.5% (5.3), Imagery: 81.2% (5.7), paired t test t(27) = 0.45, p = 0.66, BF10 = 0.22].
Trainings
Before receiving the instructions for the two tasks, participants were acquainted with the cues by performing 20 trials of two “warm-up” Go-No go tasks. In the first one, at each trial a cue was presented for 0.4 s, followed 1 s later by a dot on the left or right side of the screen. Participants were asked to press the space bar if the dot appeared on the cued side (70% valid). In the second one, a cue was presented for 0.4 s, followed 1 s later by the name of French county (“département,” a French administrative entity usually including several cities). Participants were instructed to respond only if the presented French county was on the cued side of the map (70% valid). Each task was then presented, followed by five practice trials, starting with the Perception task. The cities used for practice were selected from the pool of cities left after stimulus selection. No physiological or EEG signals were recorded during the training sessions.
Questionnaires
All participants completed two questionnaires designed to characterize individual variability in mental imagery abilities and strategies (French translations of Vividness of Visual Imagery Questionnaire (VVIQ; Marks, 1973; Santarpia et al., 2008) and Object Spatial Imagery and Verbal Questionnaire (OSIVQ; Blazhenkova and Kozhevnikov, 2009; Bled and Bouvet, 2021). From the OSIVQ ratings, we computed three subscores corresponding to different strategies or cognitive styles: Object (analogous, vivid visual mental representations), Spatial (abstract or schematic representations of relationships), and Verbal subscores. No participant obtained a mean VVIQ score between 16 and 32 [mean VVIQ (SD) = 58.55 (10.63), range 36 to 79] that would have suggested aphantasia (Zeman et al., 2015; Dance et al., 2022). No trial-by-trial vividness or confidence ratings were collected during the tasks, to maximize the total number of trials while maintaining experiment duration within a reasonable time.
To objectively assess participants' spatial knowledge of the cities used in the experiment, participants performed a localization task at the end of the experimental session. We ran this control at the end of the experiment to avoid presenting a visual map of France at the beginning of the experiment, as our aim was to ensure that participants relied on long-term semantic memory rather than recent visual encoding during the main task. In this localization task, participants were asked to place the selected cities on a blank map of France (600 × 600 pixels) displayed on the screen. The coordinates of each city placed by participants (xparticipant, yparticipant) were recorded and compared with the true coordinates for each city (xtrue, ytrue). To assess the relative accuracy of spatial arrangement of cities, a position correlation coefficient was calculated by correlating the estimated and true coordinates separately for the x- and y-axes. The general position correlation coefficient was obtained by averaging the correlations for both axes. A coefficient close to 1 indicates that participants preserved the relative spatial arrangement of cities well. To assess the absolute accuracy of their placements, the absolute difference between true and estimated positions were calculated for each city and normalized by the map diagonal. Position errors are therefore expressed as a proportion of the map’s maximum size.
Data acquisition
Recordings were performed in an electrically shielded room using the Biosemi ActiveTwo recording system (Biosemi) with a sampling rate of 1,024 Hz (DC–400 Hz bandwidth) and the Actiview software. EEG data was recorded using 64 pin-type active electrodes mounted in a flexible headcap with CMS and DRL placed left and right from POz, respectively. For the physiological recordings, flat-type active electrodes were used. The ECG was recorded using three electrodes, two of which were placed on the left and right clavicles, and one on the left lower abdomen. The vertical electro-oculogram (EOG) was recorded by placing one electrode ∼1 cm above the brow, and one ∼1 cm below the eye on the right side. Eye position was tracked using an EyeLink 1000 system (SR Research), using a monocular recording of the right eye, with a 35 mm lens at a sampling rate of 1,000 Hz. An eyetracker calibration was performed at the beginning of each block.
Behavioral data analysis
Trial selection
Trials with a response time shorter than 200 ms or exceeding 4 standard deviations from the participant’s mean were excluded from all analysis (max four trials removed per participant). Behavioral analyses were conducted on correct trials only, without further splitting by cue direction (Left/Right or West/East). After outlier exclusion, an average of 119.61 valid trials (SD = 8.5, range = 105–134) and 50.86 invalid trials (SD = 4.3, range = 43–60) remained in the Perception task. In the Imagery task, an average of 119.71 valid trials (SD = 8.1, range = 105–131) and 50.68 invalid trials (SD = 4.9, range = 41–60) were retained.
Statistical analysis: pairwise comparisons and correlation tests
Response times were analyzed using the R software (version 4.3.3; R Core Team, 2024) and the RStudio interface (version 2024.04.2.764; Posit team, 2024). We compared means using two-sided paired t test with the t.test R function (stats package; R Core Team, 2024). We explored the relationships between individual estimates across tasks, between estimates and questionnaires scores, or between electrophysiological measures and behavioral outcomes using winsorized Pearson’s robust correlation tests, with a threshold of 10%, as implemented in package WRS (Mair and Wilcox, 2020). Correlation coefficients are reported as rW. Alpha level for type I error was set at 0.05 for both comparison and correlation tests.
In addition to the mentioned comparison and correlation tests, Bayesian statistical tests were also performed to quantify the level of evidence in support or against the null hypothesis, using JASP software v0.17.3 (JASP team, 2023) and the bayesFactor package (Krekelberg, 2024) in MATLAB. For two-sided comparison tests comparing hypothesis H1 (difference) to the null hypothesis H0 of an absence of difference, Bayes factors are reported as BF10. The following interpretive thresholds were used, based on existing literature (Keysers et al., 2020; van Doorn et al., 2021). BF10 of 1 indicates equal support for H0 and H1. Values greater than 1 indicate evidence in favor of H1: anecdotal (1–3), moderate (3–10), and strong (>10). Conversely, BF10 values smaller than 1 indicate evidence in favor of H0: anecdotal (0.33–1), moderate (0.1–0.33), and strong (<0.1). For two-sided correlations tests, Bayes factors were computed on winsorized data to reduce the influence of outliers and are reported as BF10W. When relevant, one-tailed robust correlation tests were used to specifically assess evidence for or against a positive correlation. The corresponding Bayes factors are reported as BF + 0W. A BF + 0W greater than 1 indicates evidence for a positive correlation (with the same interpretive thresholds as above, while values smaller than 1 indicate evidence against a positive correlation). Although they represent distinct methodological frameworks, both frequentist and Bayesian results are reported jointly, in line with recent practical recommendations in neuroscience to enhance the transparency of statistical inference, provided that cases of divergence between the two approaches are interpreted with appropriate caution (Keysers et al., 2020). FDR correction was applied to control for multiple comparisons when necessary. All tests were two-tailed, except when otherwise specified in the Results.
Statistical modeling of response time
Linear mixed models of response times were built using the lmer function from the lme4 package (Bates et al., 2015). Response times were log10 transformed, as it provided better linearity of residuals in the models below compared with raw response times, and led to the same conclusions as raw RTs. To extract the fixed effects of cue validity and spatial parameters on response times, we used the following model for each task, with a random intercept for subject:
Model 1
Where:
Log10(RT) corresponds to the log10 transformed response time in seconds
Cue validity is a two-level factor (0 = invalid cue, 1 = valid cue)
Task-relevant distance, eccentricity, and separation refer to the spatial parameters depicted in Figure 1B. Values in kilometers (in Imagery) or in pixels (in Perception) were z-scored across all pairs and subjects:
Task-relevant distance corresponds to the difference in distances from Paris (Imagery) or from fixation (Perception), which is the main criterion for decision (task-relevant)
Eccentricity (task-irrelevant) corresponds to the sum of distances from Paris (Imagery) or from fixation (Perception)
Separation (task-irrelevant) corresponds to the distance between the two cities (Imagery) or the two dots (Perception)
To investigate interindividual variability in cue validity effects on response time, we modeled response times with a nested random effect of cue validity:
Model 2
To clarify the format of Imagery and Perception, on which attention is deployed, we adapted the model of response time by adding random slopes for task-relevant and task-irrelevant spatial parameters:
Model 3
Finally, to test whether the influence of eccentricity differed across tasks, we specifically tested the interaction between eccentricity and task in a linear mixed-effects model of reaction times, including both tasks. This model included fixed effects for cue validity, task-relevant distance, separation, eccentricity, task (Imagery vs Perception), and the eccentricity × task interaction, as well as random intercepts by participant:
Model 4
Data preprocessing
Offline preprocessing and analysis of physiological and neural signals was done using the FieldTrip toolbox implemented in Matlab (Oostenveld et al., 2011) and additional custom-built Matlab code.
EEG preprocessing
Continuous EEG data was bandpass filtered between 0.5 and 45 Hz using a zero-phase shift forward and reverse fourth-order Butterworth filter. For each channel, the area under the curve (AUC) of the signal amplitude was calculated. Channels exceeding ±3 standard deviations from the mean AUC across all channels were considered outliers, as this threshold provides a common and conservative criterion for identifying excessively noisy electrodes in the absence of an independent channel quality index (Nolan et al., 2010). These channels were repaired using a weighted average of unfiltered neighboring channels (correction applied in nine participants with a maximum of two channels repaired). The data were then rereferenced to a common average of all channels.
Automatic artifacts detection was performed on continuous EEG data using 10 s time windows and validated by visual inspection. A padding of 100 ms was systematically applied before and after each detected artifact. Sharp and transient changes in signal amplitude were identified using the absolute derivative of median-filtered (ninth-order) data and a cutoff of 20 standard deviations. To detect muscle artifacts, segmented data was bandpass filtered between 110 and 140 Hz (eighth-order, zero-phase shift forward and reverse Butterworth filter), Hilbert transformed to obtain the instantaneous amplitude envelope, and then boxcar filtered using 0.2 s windows. The Hilbert transform was used here to extract the envelope of high-frequency activity in a narrow band, which the Hilbert approach allows with high temporal resolution (Bruns, 2004). Within each 10 s time window, segments exceeding 10 standard deviations of the z-scored data were considered artifacts. Blinks were detected on the vertical EOG by using a cutoff of 2.5 standard deviations from the mean on bandpass filtered (2–15 Hz, fourth-order, zero-phase shift forward and reverse Butterworth filter), Hilbert transformed EOG data. Epochs contaminated by saccades of an amplitude exceeding 1°, as detected by the EyeLink software, were also identified. Artifact rejection was done on segmented EEG data, with trials started 2 s before and ended 3 s after the cue onset. Trials containing artifacts between −0.6 s and +1.7 s relative to cue onset were excluded from EEG analysis.
Cardiac artifacts attenuation
ECG lead II bipolar derivation was computed offline by subtracting right clavicle electrode from lower left abdomen electrode. To detect R-peaks, lead II ECG signal was first bandpassed filtered between 1 and 40 Hz (windowed-sinc FIR filter). Then, a template cardiac cycle was computed and convolved with the entire ECG time series for each block. A threshold of 0.6 was used to detect R-peaks on the normalized resulting convolution.
An independent component analysis (ICA) correction was used to attenuate cardiac artifacts as described previously (Buot et al., 2021). Continuous EEG and ECG data was first high-pass filtered with a threshold of 0.5 Hz using a fourth-order zero-phase shift forward and reverse Butterworth filter. Artifact-free EEG and ECG data was then segmented in 400 ms epochs centered on the R-peak. EEG epochs time locked to the R-peak were decomposed into independent components, and pairwise phase consistency was computed between each of these components and the ECG signal in the 0–25 Hz range. Components exceeding 3 standard deviations from the mean pairwise phase consistency were selected, up to a maximum of three components (with the highest values), and finally rejected from EEG data. All reported analysis were performed on bandpass filtered (0.5–40 Hz, fourth-order zero-phase shift two-pass Butterworth filter), artifact-free, ICA-corrected (for cardiac field artifact) EEG data.
Trial selection for EEG analysis
EEG analysis was restricted to correct trials with an artifact-free window of [−0.6 1.7 s] relative to cue onset. Because the analysis focused exclusively on the delay period between cue offset and response screen (i.e., before stimulus processing or response selection), no distinction was made between valid and invalid trials. In addition, trials identified as behavioral outliers—based on response times exceeding 4 standard deviations from the participant’s mean in each condition—were excluded from EEG analyses. After artifact rejection and outlier removal, and average of 104.82 (SD = 26.7, range = 54–162) correct trials remained in the Perception task (left cue: mean = 53.89, SD = 14.1, range = 22–82; right cue: mean = 50.93, SD = 13.4, range = 31–80) and an average of 101.96 (SD = 24.3, range = 57–144) correct trials in the Imagery task (left cue: mean = 50.75, SD = 11.6 range = 32–76; right cue: mean = 51.21, SD = 13.9 range = 25–71).
EEG analysis
Time–frequency analysis and power modulation index
For each subject and for each task, selected trials were sorted according to the cued side (left cue, right cue). Then, time–frequency representations (TFRs) were computed for each condition, using Morlet wavelets. A time–frequency wavelet transform (width = 7) was applied at each channel, between 7 and 20 Hz, with a sampling rate of 0.01 s.
To quantify attentional orienting, we computed the power modulation index (PMI) using the formula below, adapted from Leenders et al. (2018). For every subjects and for each task, PMI time courses were calculated at each electrode:
Analysis of alpha power modulation
We were interested in cue-induced modulations of alpha power in both tasks. We therefore conducted analysis on the time window starting from cue offset and ending at response screen ([0.4 1.6 s] relative to cue onset), which corresponds to the time during which participants were instructed to orient and maintain their attention toward the cued side of either the mental image of the map (Imagery task) or the visual field (Perception task). This time window is commonly used to isolate the alpha lateralization that corresponds to endogenous, voluntary spatial attention orienting (Worden et al., 2000; Thut et al., 2006; Wyart and Tallon-Baudry, 2009).
PMIalpha(t) time courses were compared with the null using a nonparametric cluster-based permutation approach (Maris and Oostenveld, 2007), which corrects for multiple comparisons. Two-tailed paired t tests were used to compare each sample to zero at each electrode. Candidate clusters were defined by adjacent points in time and space with more than three neighboring electrodes, exceeding a statistical threshold of 0.05 (cluster-forming alpha level). Monte Carlo method was used to compute clusters' statistic: condition labels (experimental PMI or null value) were randomized 5,000 times, and for each randomization the largest (positive) and smallest (negative) sum(t) of the observed clusters were used to generate the distribution of largest and smallest sum(t) under the null hypothesis. The Monte Carlo p value corresponds to the proportion of cluster statistics under the null distribution that exceed the original cluster-level test statistics. Because this method considers the largest and smallest sum(t) at each permutation, it intrinsically corrects for multiple comparisons across time samples and electrodes. A repeated-measures ANOVA was conducted on the average PMIalpha across the significant clusters to examine spatial dissociation between tasks. For the two conditions, PMIalpha was averaged over the [0.549 0.929 s] time window for the frontal cluster (Fpz, Fp1, Fp2, AFz, AF3, AF4, AF7, AF8, Fz, F1, F2, F3, F4, F5, F6, FCz, FC1, FC2, FC3) and over the [0.4 0.69 s] time window for the posterior cluster (CPz, CP1, CP3, Pz, P1, P3, P5, POz, PO3, PO7, O1). To test whether the observed task-by-site dissociation could be explained by differences in time windows, we conducted a complementary 2 × 2 × 2 repeated-measures ANOVA including site (frontal, posterior), task (Imagery, Perception), and time (early, 0.4–0.55 s; late, 0.69–0.93 s) as within-subject factors. These nonoverlapping time windows were selected from the original significant clusters to minimize confounding effects due to the 140 ms overlap between them.
Control of saccades
Saccades were identified from the eye-tracking signal with the EyeLink software. Trials containing large saccades (i.e., >1° of visual angle) during the postcue delay period (0.4–1.6 s relative to cue onset) were excluded from all EEG analyses. To assess whether small (<1°) saccade occurrence or direction systematically differed across left cue and right cue trials, we performed a linear mixed-effects model on small saccade counts, with task (Imagery, Perception), cue direction (left, right), and saccade direction (left, right) and their interactions as fixed effects and subject as a random effect.
Source reconstruction
We reconstructed the sources of alpha activity by using dynamical imaging of coherence sources (DICS) spatial filter approach (Gross et al., 2001) as implemented in FieldTrip. To build a forward model, we used the standard boundary element method (BEM) volume conduction head model provided by FieldTrip (Oostenveld et al., 2003), a 5 mm grid source model (inward shift zero), the default Colin27 MRI anatomy, together with Biosemi 64 10–20 layout. For each task and for each subject, we computed a common DICS spatial filter based on the cross-spectral density (CSD) matrix of right and left cue trials altogether. CSD matrices were computed for all sensor combinations from Fourier transformed data, using DPSS tapers at a frequency of 10 Hz and with a smoothing window of ±2 Hz. To accommodate the 2 Hz smoothing, a temporal window of 400 ms was selected as it represents the minimum duration required for reliable estimation of frequencies within the 8–12 Hz range. Based on the timing of significant alpha power modulations at the electrode level, the 400 ms windows of [0.4 0.8 s] and [0.55 0.95 s] were therefore used for Perception and Imagery, respectively. The DICS spatial filter was then applied to each cue condition at each grid point of the anatomical source model. Finally, PMIalpha was calculated at each grid point and averaged across subjects.
Data and code accessibility
The custom code and data used for the main analyses in this article can be accessed online at https://osf.io/8fdu9/?view_only=7dc6e741e5c645ff8c64e853384625af. According to European and French law on personal data protection, sharing individual EEG data requires establishing an institutional data sharing agreement.
Results
Accuracy
Participants, who were all but one enrolled in geography classes, performed the Imagery task with good accuracy (mean = 81.2%, SD = 5.7, min = 71.9, max = 90.9). Accuracy in the Perception task was matched by design (mean = 81.5%, SD = 5.3, min = 72.4, max = 90.5). Participants demonstrated excellent geographic knowledge, as assessed at the very end of the experiment by placing cities on a blank map, with a mean general position correlation of 0.91 (min = 0.74, max = 0.99, SD = 0.08), and high absolute placement accuracy, with an average normalized error of 6.82% (min = 2.04%, max = 14.5%, SD = 3.3%).
Spatial attention benefits both Visual Perception and Mental Imagery
Hit rates were not significantly affected by spatial attention in either task [mean HR (SD), Perception: valid trials = 81.76% (5.7), invalid trials = 81.2% (6.8), paired t(27) = −0.44, p = 0.665, BF10 = 0.24; Imagery: valid trials = 81.78% (5.7), invalid trials = 80.87% (7.5), paired t(27) = −0.92, p = 0.368, BF10 = 0.29]. All subsequent analysis were carried on correct trials only.
We first tested for an influence of spatial attention on response time (Fig. 2A). As expected in the Visual Perception task, mean response times were significantly shorter for validly cued than for invalidly cued trials [mean RT (SD), valid trials = 1.128 s (0.18); invalid trials = 1.182 s (0.19); paired t test t(27) = 3.56, p = 0.001, BF10 = 25.19]. Spatial attention also significantly shortened response times in the Mental Imagery task [mean RT (SD), valid trials = 1.840 s (0.25); invalid trials = 1.893 s (0.28); paired t test t(27) = 2.79, p = 0.01, BF10 = 4.77]. Spatial attention is thus deployed not only in Perception but also in Imagery, with medium effect sizes in both tasks (Perception, Cohen’s d = 0.67; Imagery: Cohen’s d = 0.53).
Effects of cue validity on response time in correct trials. A, Mean response times were significantly faster in valid trials compared with invalid trials, in both Perception (blue) and Imagery (green) tasks. B, Individual estimates (Model 2) of cue validity on response time were significantly different from zero in both Perception (t(27) = 4.59, p < 0.001, BF10 = 282.72) and Imagery (t(27) = 3.43, p = 0.002, BF10 = 18.81). C, Individual estimates for cue validity did not positively correlate across tasks (one-sided correlation test, moderate Bayesian evidence, BF + 0W = 0.11). **p < 0.01, ***p < 0.001.
To confirm that spatial attention operates in mental images from long-term memory, we built a linear mixed model (Model 1) that quantifies not only the influence of spatial attention but also the importance of the other spatial parameters of the two tasks (task-relevant distance, i.e., difference of the distance of points to fixation or cities to Paris; sum of the distance of points to fixation, which is related to overall eccentricity; and separation between points, see Fig. 1B for a graphical description). As indicated in Table 1, Model 1 confirmed the effect of cue validity in both tasks, even when other spatial parameters were accounted for, at the group level. The influence of other spatial parameters is further explored in the next section.
Linear mixed-effects model explaining reaction times on correct trials in the Perception and Imagery tasks
Spatial attention benefits in Imagery and Perception are not positively correlated
We further reasoned that if spatial attention mechanisms of the same nature are deployed in both tasks, we should observe a positive relationship between individual estimates of cue validity across tasks. We thus extracted the individual estimates of cue validity in a model (Model 2) identical to Model 1 but where cue validity was a random effect nested within subjects. As illustrated in Figure 2B, individual estimates of cue validity on response time were significantly different from zero in both Perception (t(27) = 4.59, p < 0.001, BF10 = 282.72) and Imagery (t(27) = 3.43, p = 0.002, BF10 = 18.81), as expected from group effects in Model 1. We tested for differences between tasks and found that cue validity had a greater impact in Perception than in Imagery (t(27) = 2.80, p = 0.009, BF10 = 4.90). However, we found no significant correlation between individual estimates of spatial attention influence on RTs, with inconclusive Bayes factor (robust two-sided correlation test, r²W = 0.06, t(26) = −1.27, p = 0.21, BF10W = 0.79; Fig. 2C). Furthermore, when testing specifically for a positive correlation, i.e., the direction of effect that would be expected if spatial attention operated similarly in both tasks, we found moderate evidence against such a relationship (one-sided correlation test, BF + 0W = 0.11, i.e., below the conventional threshold of 0.33). As a comparison, we verified that, in Model 2, random intercepts covaried across tasks (r²W = 0.37, t(26) = 3.90, p < 0.001, BF10W = 54.85), i.e., we could replicate the well-known findings that fast participants in one task are also fast in the other. Altogether, these analyses suggest that although valid cues shorten response times in both tasks, the magnitude of this benefit does not positively covary across tasks.
Finally, we investigated whether interindividual variability in the validity effect reflected differences in mental imagery vividness or styles, by testing correlations with the questionnaires. After correction for multiple comparisons (four tests), individual estimates for cue validity were not significantly correlated with VVIQ scores (r²W = 0.06, t(26) = 1.36, pFDR = 0.37, BF10W = 0.88) nor with any of the OSIVQ subscales (Object: r²W = 0.10, t(26) = 1.66, pFDR = 0.37, BF10W = 1.34; Spatial: r²W < 0.01, t(26) = −0.12, pFDR = 0.90, BF10W = 0.42; Verbal: r²W = 0.02, t(26) = −0.72, pFDR = 0.64, BF10W = 0.52). All Bayes factor provided inconclusive evidence either for or against the presence of a correlation, leaving open the question of the link between spatial attention influence on behavior and mental imagery questionnaires.
Mental Imagery operates in a spatial format that is not analogous to Visual Perception
To what extent does mental imagery operate in a spatial format analogous to perception in this experiment? Beyond the effect of spatial attention detailed in the section above, other spatial parameters have a strong influence on response times, with some commonalities but also differences between tasks (Table 1). In both tasks, the larger the task-relevant distance and the smaller the task-irrelevant distance between cities/dots (separation), the faster the response time. These results suggest that mental representations encode not only relative spatial relationships but also absolute distances, which can influence behavior even when participants are not explicitly asked to simulate navigation along these distances.
However, unlike in Perception, eccentricity did not significantly affect response times in Imagery. To confirm that the effect of eccentricity differed across tasks, we specifically tested the eccentricity × task interaction in a linear mixed-effect model of reaction times. This analysis revealed a significant interaction (estimate = −1.59 × 10−2, SE = 2.82 × 10−3, t = −5.65, p < 0.001), where the influence of eccentricity on response time was markedly reduced in the Imagery task compared with the Perception task (Table S1). These findings indicate that while mental imagery preserves key spatial properties, the way these properties shape behavior diverges from perceptual processing.
To further explore this idea, we isolated individual values for each spatial parameter using a third linear mixed model (random slopes, Model 3) and tested for correlations across tasks. More precisely, if spatial parameters are represented in similar manner in both tasks, one would predict a positive correlation between spatial parameter influence on reaction times in each task. As expected from Model 1, task-relevant distance impacted both Perception and Imagery in the same direction, but we found inconclusive Bayesian evidence regarding a correlation of individual estimates across tasks (two-sided correlation test, r²W = 0.04, t(27) = −1.13, uncorrected p = 0.27, BF10W = 0.68; Fig. 3A). However, when testing specifically for positive correlation, we found moderate Bayesian evidence for an absence of such a positive correlation (one-sided correlation test, BF + 0W = 0.12). For the task-irrelevant eccentricity parameter, Bayes evidence was inconclusive regarding a correlation (two-sided correlation test, r²W = 0.15, t(27) = −2.15, uncorrected p = 0.04, BF10W = 2.03) but robust for an absence of positive correlation (one-sided correlation test, BF + 0W = 0.09; Fig. 3B). Finally, we found inconclusive Bayesian evidence regarding a correlation between the influence of separation on response time across tasks (two-sided test, r²W < 0.01, t(27) = 0.55, p = 0.59, BF10 = 0.44, one-sided test, BF + 0 = 0.34; Fig. 3C). Because our reasoning is based on negative findings supported by Bayesian evidence, we verified, as a comparison, that the random intercepts were significantly correlated across tasks in Model 3, i.e., participants who were fast in one task were also fast in the other (two-sided test, r²W = 0.33, t(27) = 3.30, p = 0.003, BF10W = 13.25, one-sided, BF + 0 = 24.38, strong). Taken together, these results suggest that while mental representations encode spatial relationships, the influence of spatial relationships on reaction times in imagery does not directly mirror the influence of spatial relationships in perception, since we repeatedly find evidence in favor of an absence of positive correlation. Finally, the correlation analyses yielded inconclusive results regarding the association between sensitivity to spatial parameters in Imagery and questionnaire measures (all pFDR > 0.34, all BF10W between 0.33 and 3; Table S2).
Effects of spatial parameters on response time. Individual estimates for each spatial parameter, beyond other effects, were extracted from Model 3 and tested for correlations across tasks (robust correlation test). A, Increased distance difference from fixation/Paris (task-relevant distance) was associated with a significant reduction in response time in both Perception and Imagery, with moderate evidence for no positive correlation (BF + 0W = 0.12). B, Eccentricity, measured as the sum of distances from fixation/Paris, significantly impacted response times in Perception—responses were faster for stimuli located close to fixation—but had no significant effect in Imagery. Eccentricity estimates did not correlate across tasks, with strong Bayesian evidence (BF + 0W = 0.09). C, Separation between stimuli had a significant effect on response time in Perception—responses were faster when points were farther apart. In the Imagery task, the mean of individual estimates was not significantly different from zero (t(27) = 0.89, p = 0.38, BF10 = 0.29), although separation had a significant fixed effect in Model 1 (*f). Random slopes for separation did not significantly correlate across tasks, with anecdotal Bayesian evidence for the absence of positive correlation). D, As a control, we verified that individual intercepts, in the same model, correlated across tasks: fast participants in one task were also fast in the other. ***p < 0.001, *p < 0.05.
Spatial attention modulates frontal alpha power in Mental Imagery and parieto-occipital alpha power in Visual Perception
We next investigated whether attentional orienting in Perception and Imagery both rely on alpha power modulations. Using EEG, we computed the attentional PMI [PMI(f,t)] time courses at each electrode for each participant, reflecting the normalized difference in power between left cue and right cue trials. PMI values were averaged over the alpha band [PMIalpha(t), 8–12 Hz], as cue-induced modulation of power in this range has been robustly described in visuospatial attention (Worden et al., 2000; Sauseng et al., 2005; Thut et al., 2006; Rihs et al., 2007) and visual short-term memory (Poch et al., 2014; Myers et al., 2015; Leenders et al., 2018; Rösner et al., 2020). We analyzed PMIalpha(t) during attentional orienting, i.e., from cue offset (0.4 s) to response screen (1.6 s), and tested whether it differed from the null distribution using cluster-based permutation tests.
In Perception, we observed significantly higher alpha power over a cluster of 11 left parieto-occipital electrodes (CPz, CP1, CP3, Pz, P1, P3, P5, POz, PO3, PO7, O1) when attention was directed leftward compared with rightward [0.4 to 0.69 s after cue onset; cluster sum(t) = 742.35, Monte Carlo p = 0.043; Fig. 4A]. In Imagery, we also found significant alpha modulation during attentional orienting, but over a cluster of 19 bilateral frontal electrodes (Fpz, Fp1, Fp2, AFz, AF3, AF4, AF7, AF8, Fz, F1, F2, F3, F4, F5, F6, FCz, FC2, FC1, FC3), with higher alpha power when attention was directed to the left relative to the right [0.549–0.929 s after cue onset; cluster sum(t) = 1581, Monte Carlo p = 0.009; Fig. 4B]. Neither the posterior cluster in Perception nor the frontal cluster in imagery significantly covaried with individual estimates for cue validity (Model 2) in their respective tasks (Perception: r²W = 0.03, p = 0.16, BF10W = 0.50; Imagery: r²W = 0.02, p = 0.74, BF10W = 0.47). Visual inspection of time–frequency representations of PMI(f,t) over a wider frequency range (7–20 Hz) confirms that cue-induced power modulations were indeed confined to the alpha frequency range in both tasks (Fig. 4C,E).
Differential cue-induced modulations of alpha (8–12 Hz) power in Perception (blue) and Imagery (green). A, Topography of the left parieto-occipital (posterior) cluster where alpha power was significantly modulated by spatial attention in Perception [PMIalpha(t) significantly different from null, 0.4–0.69 s after cue onset] and PMIalpha(t) time course over corresponding electrodes. B, Topography of the significant frontal cluster in Imagery (from 0.549 to 0.929 s after cue onset) and PMIalpha(t) over these electrodes. C, Time–frequency representation of the power modulation index over the left parieto-occipital cluster in Perception. Cue-induced power modulations were confined to alpha band (8–12 Hz). D, Mean PMIalpha values across frontal (Front.) and posterior (Post.) clusters for Perception (blue) and Imagery (green). Solid bars represent significant clusters obtained with cluster-based permutation tests in the respective task, while dashed bars indicate the cluster in the other task. E, In Imagery, power modulations over the frontal cluster were similarly confined to the alpha band. F, Time course of PMIalpha(t) values over the frontal electrodes in Perception. G, Time course of PMIalpha(t) values over the posterior electrodes in Imagery. H, Brain regions showing the highest modulation of alpha power (>70% of maximum) in Perception during the time window [0.4 0.8 s after cue onset] were located in the left inferior temporal gyrus (ITGL), the left lateral extrastriate visual cortex (LEVCL), and the left intraparietal sulcus (IPSL). I, In Imagery, the regions showing the strongest modulation of alpha power [>70% (red) or less than −70% (black) of absolute maximum] were located in the left inferior frontal gyrus (IFGL), left inferior temporal gyrus (ITGL), and right intraparietal sulcus (IPSR). Shaded areas correspond to the standard error of the mean. **p(Monte Carlo) < 0.01, *p(Monte Carlo) < 0.05.
To further verify the spatial dissociation between conditions, we performed an ANOVA with two factors: site (Frontal, Posterior) and task (Perception, Imagery). For this analysis, PMIalpha values were averaged over two distinct time windows, corresponding to the temporal extent of the significant clusters identified in each task (0.4–0.69 s for the posterior cluster, and 0.549–0.929 s for the frontal cluster). The analysis confirmed the spatial dissociation, with a significant interaction between site and condition (F(1,27) = 4.75, p = 0.04). No main effects were found for Site (F(1,27) = 2.03, p = 0.16) or Task (F(1,27) = 0.11, p = 0.75), supporting the conclusion that attentional orienting results in stronger posterior alpha power modulation in Perception and stronger alpha power frontal modulation in Imagery (Fig. 4D,F,G).
To ensure that the observed spatial dissociation was not driven by the use of different time windows for each site, we conducted a complementary 2 × 2 × 2 repeated-measures ANOVA with site (Frontal, Posterior), task (Imagery, Perception), and time (Early, 0.4–0.55 s; Late, 0.69–0.93 s) as within-subject factors. These nonoverlapping windows were selected to reduce ambiguity due to the original 140 ms overlap between the two tasks. The task × site remained significant (F(1,27) = 5.34, p = 0.03) when controlling for time, confirming that the spatial dissociation between tasks cannot be attributed to timing differences. No main effect of time (F(1,27) = 0.12, p = 0.74) or significant site × time interaction (F(1,27) = 0.51, p = 0.48) was observed. Crucially, there was no significant interaction between task and time (F(1,27) = 0.06, p = 0.82), which rules out the possibility that similar patterns emerged in both tasks but with different latencies. Main effects of task (F(1,27) = 0.01, p = 0.91) and site (F(1,27) = 0.21, p = 0.65) were also nonsignificant. Finally, a significant three-way interaction (task × site × time: F(1,27) = 4.46, p = 0.04) suggested task-specific spatiotemporal dynamics of alpha modulations. Notably, attentional orienting effects emerged slightly later in Imagery than in Perception, consistent with the idea that mental imagery processes unfold more gradually than visual processes (Newman et al., 2007; Dijkstra et al., 2018), potentially reflecting here a less automatic strategy.
Finally, these modulations are unlikely to be driven by systematic differences in large eye movement, as trials containing saccades greater than 1° visual angle during the postcue delay were excluded during EEG preprocessing. However, because small saccades (<1°) have recently been shown to contribute to lateralized alpha power modulation in visuospatial attention tasks (Liu B. et al., 2022, 2023), we examined whether their direction varied systematically with cue direction during attentional orienting. To this end, we fitted a linear mixed-effects model of small saccade counts with task (Imagery vs Perception), cue direction (left vs right), and saccade direction (Left vs Right) as fixed effects and subject as random effect. This analysis revealed a trend toward an effect of Task (estimate = 3.71, t = 1.74, p = 0.084), with more saccades observed in the Imagery task (mean = 82.78, SD = 68.6) than in the Perception task (mean = 68.25, SD = 57.6), consistent with previous findings that oculomotor activity plays a role in mental imagery processes (Spivey and Geng, 2001; Bourlon et al., 2011; Fourtassi et al., 2017). No significant effects of cue direction (estimate = 1.25, t = 0.59, p = 0.56) or saccade direction (estimate = 2.54, t = 1.186, p = 0.24) were found, and no significant two-way or three-way interaction emerged (all p > 0.39). These results indicate that saccade direction was not systematically aligned with cue direction and thus unlikely to account for the observed modulations of alpha power.
Distinct sources of alpha modulation in Imagery and Perception
To determine the regions involved, we reconstructed the sources of alpha power within the significant time windows for each task and computed PMIalpha at the source level. In Perception, the brain regions showing the highest modulation of alpha power (>70% of absolute maximum) were localized in the left lateral extrastriate visual cortex (LEVCL, MNI coordinates: −45, −75, 10), the left inferior temporal gyrus (ITGL, MNI coordinates: −52, −60, −23), and the left intraparietal sulcus (IPSL, MNI coordinates: −28, −70, 38; Fig. 4H). In Imagery, the brain regions showing the highest modulation of alpha power (>70% of absolute maximum) were localized in the left inferior frontal gyrus Brodmann 44 (IFGL, MNI coordinates: −50, 10, 10), in the left inferior temporal gyrus (MNI coordinates: −56, −58, −22), and in right parieto-occipital region, centered on the right intraparietal sulcus (IPSR, MNI coordinates: 30, −90, 30; Fig. 4I). As the brain regions where alpha activity is most sensitive to attentional orienting appeared more distributed in Mental Imagery than in Visual Perception, we investigated whether this broader attentional modulation profile corresponded with greater variability in strategies employed during the task. We conducted an exploratory analysis by examining correlations between PMIalpha(t) values of the three key regions during Imagery and participants’ sensitivity to the two spatial parameters that show a significant effect on response times (task-relevant distance and separation). These tests provided inconclusive evidence, with all Bayesian factors falling within the anecdotal range (all pFDR > 0.16, all BF between 0.33 and 3; Table S3). We further examined whether alpha modulation profiles varied with general imagery styles, as assessed by the OSIVQ. However, no significant associations were found between PMIalpha(t) values at the source level and OSIVQ subscores (all pFDR > 0.73, all BF in the anecdotal range; Table S4).
Discussion
The cue validity effect, characterized by shorter response times following valid attentional cues compared with invalid ones, is a robust indicator of spatial attention in both visual perception (Posner, 1980; Corbetta and Shulman, 2002) and working memory (Griffin, 2003; Nobre et al., 2007). By asking participants to orient their attention within the mental map of France, we show that spatial attention can also operate on long-term memory-based mental imagery. However, this orienting differed from visual spatial attention in perception. First, attentional benefits in reaction times were not positively correlated across tasks. Second, spatial attention in Imagery was associated with frontal alpha-band modulation, whereas perceptual attention involved the well-documented parieto-occipital alpha modulation. Beyond attention, spatial relationships between items influenced behavior in both tasks, but with distinct characteristics. Altogether, our results show that both spatial organization and spatial attentional orienting are present in general knowledge imagery but with markedly distinct behavioral and neural characteristics as compared with perception.
Functional spatial organization in Imagery differs from Perception
Participants, primarily geography students with verified spatial knowledge of French city locations, performed the Imagery task with high accuracy, relying solely on general knowledge. As expected, we found that response times were influenced by the task-relevant distance. This aligns with the idea that semantic imagery can support spatial computations, reminiscent of findings in mental scanning (Kosslyn et al., 1978; Denis et al., 1995), even if here participants were not required to navigate between cities. However, this observation alone is insufficient to prove a spatial format, as participants could have compared memorized distance values propositionally. Yet, we found that at least one task-irrelevant spatial parameter also influenced responses, indicating that participants at least partially relied on a spatial format rather than solely on a propositional representation.
Still, our results suggest that this spatial format is not analogous to perception. A depictive format, one that preserved all spatial relationships as in visual experience, would predict consistent effects across tasks (Kosslyn et al., 2006). However, we observed significant differences in how eccentricity affected response times in Imagery versus Perception, with robust evidence against a positive correlation between the two tasks. This indicates that spatial relationships in imagery are not only partially recreated—possibly due in part to nonspatial strategies—but also represented in a format that is not similar to visual perception. These results challenge the assumption that mental imagery space is isomorphic to visual space, at least in the specific case of topological representations like geographical maps (Guariglia et al., 2013; Boccia et al., 2015).
Spatial attention in general knowledge imagery engages distinct neural mechanisms
We found evidence that spatial attention operates in mental images drawn from long-term memory. However, our results indicate that its mechanisms differ from those of visuospatial attention: behavioral effects were not correlated across tasks, and spatial attention was associated with distinct patterns of alpha activity modulation. In Perception, spatial attention modulated alpha-band activity over occipito-parietal sensors. At the source level, these modulations reflected a difference in alpha power between attention oriented to the left versus to the right in a large left posterior brain region centered on the lateral extrastriate visual cortex (LEVC), extending superiorly to the intraparietal sulcus (IPS) and inferiorly to the inferior temporal gyrus (ITG), corresponding to the typical alpha signatures of visuospatial attention (Worden et al., 2000; Sauseng et al., 2005; Thut et al., 2006; Gould et al., 2011; Capilla et al., 2012; Cruz et al., 2024). Note that because we contrast between attention toward the left versus right, brain regions known to be engaged in attentional orienting in general, independently from the direction, are absent—such as the right IPS (Corbetta and Shulman, 2002; Thiebaut de Schotten et al., 2011; Bartolomeo and Seidel Malkinson, 2019).
In Imagery, West versus East spatial attention prominently modulated alpha activity over fronto-central electrodes. These modulations occurred even though reliance on semantic information was minimal, as they emerged before the presentation of the city names. They involved three main regions: the right intraparietal sulcus (IPS), the left inferior frontal gyrus (IFG), and the left inferior temporal gyrus (ITG). Traditionally associated with the ventral visual stream and involved in visual shape and object perception (Ungerleider and Haxby, 1994; Grill-Spector and Weiner, 2014), the left ITG region is the only one where we find a convergence between attentional contrasts in perception (left vs right) and imagery (West vs East).
Alpha activity modulations in IPS during spatial orienting in Imagery indicate the involvement of the dorsal attention network, classically described in visuospatial attention (Corbetta et al., 1998; Corbetta and Shulman, 2002). Here, we report an asymmetry in IPS involvement, with a predominance of right-lateralized modulations of alpha power in Imagery and left-lateralized modulations in Perception. Note that contrasts of attention direction (left vs right, West vs East) as presented here cannot be directly compared with contrasts in imagery versus perception as often done in the brain imaging literature (Spagna et al., 2021; Liu J. et al., 2022); hence, comparison of lateralization patterns is difficult.
Last but not least, the modulation of alpha-band activity in the left IFG during Imagery, which significantly contributes to the observed frontal topography, suggests the recruitment of mechanisms entirely distinct from those underlying visuospatial attention. This region has been strongly associated with different aspects of language processing (Costafreda et al., 2006; Uddén and Bahlmann, 2012), inner speech (McGuire et al., 1996; Morin and Michaud, 2007), and internal attention (Hasenkamp et al., 2012; Scheibner et al., 2017) but is also potentially linked to the ventral attention network (Silvetti et al., 2016; Mengotti et al., 2020), although a right-hemispheric dominance is usually observed (Corbetta and Shulman, 2002; Corbetta et al., 2008; Thiebaut de Schotten et al., 2011; Igelström et al., 2015).
Overall, our results challenge the assumption that spatial attention in mental imagery relies on the same mechanisms as visuospatial attention. In this sense, they might offer new insights into deficits in spatial attention during mental imagery, such as imaginal neglect. This syndrome often co-occurs with perceptual neglect, supporting the idea of a shared exploratory bias (Bartolomeo et al., 1994; Bourlon et al., 2008). The convergence of attentional modulations in both imagery and perception within the left ITG may contribute to this overlap. On the other hand, the existence of dissociations between perceptual and imaginal neglect, including pure cases of imaginal neglect (Bisiach and Luzzatti, 1978; Guariglia et al., 1993, 2013; Beschin et al., 1997; Cocchini et al., 2004), suggests that distinct mechanisms may also be involved. Our findings add to this view by showing that spatial attention orienting in imagery does always not rely on the reuse of visuospatial attention mechanisms, even when the internal representation is highly spatialized, as in the case of a geographic map (Boccia et al., 2015).
Spatial attention and the diversity of mental imagery
We show that mental imagery, when based on representations from general knowledge memory, is not analogous to visual perception—neither in its spatial format nor in the mechanisms of spatial attention it engages. This lack of analogy contrasts with findings on representations in visual working memory. Indeed, spatial attention orienting in working memory is associated with alpha-band activity modulation patterns similar to those observed in visuospatial attention (Poch et al., 2014; Myers et al., 2015; Foster et al., 2017; Leenders et al., 2018; Rösner et al., 2020; Sutterer et al., 2021), aligning with a sensory recruitment model, which posits that holding and manipulating stimuli in working memory engage the same neural mechanisms as their perception (D’Esposito, 2007; Scimeca et al., 2018). We show that the analogy to visual perception breaks down when mental images are derived from long-term memory, even when they are spatially organized. This divergence may stem from fundamental differences in the type of the mental image. Prior work suggests that the degree of overlap between imagery and perception depends on how closely mental images adhere to perceptual details (Dijkstra et al., 2017; Dijkstra, 2024). Representations held in working memory are often anchored to recent percepts and retain low-level visual features, whereas mental images based on long-term memory may be less tied to sensory detail (Bigelow et al., 2023). Moreover, certain forms of imagery, such as mental maps, possess topological structure and are likely supported by neural networks distinct from those underlying nontopological imagery (Guariglia et al., 2013; Boccia et al., 2015). These results emphasize that the memory source, structure, and functional purpose of a mental image not only shapes the neural mechanisms it engages (Spagna et al., 2024) but also determines the extent to which it overlaps with perception (Pearson et al., 2008; Dijkstra, 2024).
Conclusion
We show that a mental image derived from long-term memory can adopt a spatial format, within which spatial attention orienting is possible. However, these properties are not analogous to those of visual perception. Our results contrast with previous findings on visual working memory and underscore the diversity of mechanisms engaged in the exploration of a mental image.
Footnotes
This work was supported by the Fondation pour la Recherche Médicale, grant number FDT202404018363, and Institut National de la Santé et de la Recherche Médicale, Poste d’accueil, to A.C, as well as by grant ANR-17-EURE-0017 and ANR-10-IDEX-0001-02.
The authors declare no competing financial interests.
This paper contains supplemental material available at: https://doi.org/10.1523/JNEUROSCI.0691-25.2025
- Correspondence should be addressed to Anthony Clément at anthony.clement{at}ens.psl.eu or Catherine Tallon-Baudry at catherine.tallon-baudry{at}ens.psl.eu.










