Abstract
Priority map theory is a leading framework for understanding how various aspects of stimulus displays and task demands guide visual attention. Per this theory, the visual system computes a priority map, which is a representation of visual space indexing the relative importance, or priority, of locations in the environment. Priority is computed based on both salience, defined based on image-computable properties; and relevance, defined by an individual's current goals, and is used to direct attention to the highest-priority locations for further processing. Computational theories suggest that priority maps identify salient locations based on individual feature dimensions (e.g., color, motion), which are integrated into an aggregate priority map. While widely accepted, a core assumption of this framework, the existence of independent feature dimension maps in visual cortex, remains untested. Here, we tested the hypothesis that retinotopic regions selective for specific feature dimensions (color or motion) in human cortex act as neural feature dimension maps, indexing salient locations based on their preferred feature. We used fMRI activation patterns to reconstruct spatial maps while male and female human participants viewed stimuli with salient regions defined by relative color or motion direction. Activation in reconstructed spatial maps was localized to the salient stimulus position in the display. Moreover, the strength of the stimulus representation was strongest in the ROI selective for the salience-defining feature. Together, these results suggest that feature-selective extrastriate visual regions highlight salient locations based on local feature contrast within their preferred feature dimensions, supporting their role as neural feature dimension maps.
SIGNIFICANCE STATEMENT Identifying salient information is important for navigating the world. For example, it is critical to detect a quickly approaching car when crossing the street. Leading models of computer vision and visual search rely on compartmentalized salience computations based on individual features; however, there has been no direct empirical demonstration identifying neural regions as responsible for performing these dissociable operations. Here, we provide evidence of a critical double dissociation that neural activation patterns from color-selective regions prioritize the location of color-defined salience while minimally representing motion-defined salience, whereas motion-selective regions show the complementary result. These findings reveal that specialized cortical regions act as neural “feature dimension maps” that are used to index salient locations based on specific features to guide attention.
Introduction
Often, we search for items that are relevant to ongoing goals, such as the coffee maker in the morning. However, objects within a given scene are constantly vying for our attention. A salient but task-irrelevant object, like a bright yellow banana on the counter or a cat leaping across the kitchen, may distract our attention and slow the search for coffee. One prominent model that highlights the competition between task-irrelevant salience and task-relevant goals in guiding attention is priority map theory (Treisman and Gelade, 1980; Wolfe, 1994; Itti and Koch, 2001; Fecteau and Munoz, 2006; Serences and Yantis, 2006; Awh et al., 2012). Per this theory, the activity profile across a priority map reflects a combination of task-relevant locations and salient, but irrelevant, locations, and is used to direct attention to the highest priority locations (Carrasco, 2011; Eckstein, 2011; Yu et al., 2023).
To compute the bottom-up salience associated with a given location, priority map theory posits that information about individual feature dimensions (e.g., color, motion, etc.) is independently extracted from retinal input into a series of “feature dimension maps.” For a given feature dimension map, salient regions of space are defined based on within-dimension local feature contrast, such as the aberrant color of the yellow banana or motion direction of the leaping cat (Itti and Koch, 2000, 2001). A conspicuous location defined by a given feature dimension is given high activation in the corresponding feature dimension map (Fig. 1). Activity profiles across various feature dimension maps are then integrated into a unified feature-agnostic priority map, which indexes the most important locations within the visual field, regardless of the source of their importance.
Feature dimension maps index salient locations based on their preferred feature dimension. Priority map theory invokes “feature dimension map” to compute salient location(s) based on local feature contrast within each feature dimension (e.g., color, motion). Accordingly, when a location in a stimulus display is made salient based on local differences in motion direction, activation profiles over a “motion map” should track the salient location, while a “color map” would not. Similarly, map activation corresponding to a salient motion stimulus should be stronger in a “motion map” than when a location is made salient based on local differences in color. Complementary results would be predicted for a “color map,” with stronger activation at the location of a salient color stimulus compared with a salient motion stimulus. While these feature dimension maps productively account for behavioral results in visual search tasks, it remains unknown whether salient locations are independently indexed in different feature-selective regions in visual cortex.
Studies in humans and nonhuman primates have identified stronger neural responses associated with salient stimulus locations than those associated with nonsalient locations throughout the brain (Bichot and Schall, 1999; Bisley and Goldberg, 2006; Bogler et al., 2011, 2013; White et al., 2017), consistent with a neural instantiation of a priority map (Fecteau and Munoz, 2006; Serences and Yantis, 2006; Bisley and Goldberg, 2010; Katsuki and Constantinidis, 2014). However, despite this converging neural evidence offering strong support for the implementation of feature-agnostic priority maps, support for feature dimension maps is primarily based on behavioral studies measuring visual search response times (Folk et al., 1992; Theeuwes, 1992; Bacon and Egeth, 1994; Müller et al., 1995; Treisman, 1998; Wolfe and Horowitz, 2004; Huang and Pashler, 2007; Folk and Anderson, 2010). Indeed, neural studies have all either focused on stimulus displays involving a single salient feature dimension (Moran and Desimone, 1985; Cook and Maunsell, 2002; Beck and Kastner, 2005; Zhang et al., 2012), measured from a single feature-selective visual region (Martínez-Trujillo and Treue, 2002; Mazer and Gallant, 2003; Reynolds and Desimone, 2003; Ogawa and Komatsu, 2004, 2006; Bichot et al., 2005; Burrows and Moore, 2009; Klink et al., 2023), or studied feature-agnostic salience (Gottlieb et al., 1998; Bisley and Goldberg, 2006; Bogler et al., 2011, 2013; Sprague et al., 2018b). Thus, despite the key theoretical role specific stimulus feature dimensions are believed to play in computing representations of stimulus salience, it remains unknown whether the brain implements this compartmentalized computational architecture.
Here, we sought to resolve this question by testing the hypothesis that feature-selective retinotopic regions of visual cortex preferentially index salient locations based on their preferred feature dimension. Within retinotopic ROIs, we characterized feature selectivity and spatial salience computations for task-irrelevant visual stimuli defined by different feature dimensions using a spatial inverted encoding model (IEM). Participants attended a central fixation point while viewing stimuli typically containing one salient location. Across trials, we varied the salience-defining stimulus feature (color, motion, or a single salient stimulus in isolation). If color-selective retinotopic regions hV4/VO1/VO2 (Brewer et al., 2005; Conway et al., 2007; Mullen, 2019) and motion-selective retinotopic regions TO1/TO2 (Albright, 1993; Huk et al., 2002; Amano et al., 2009) act as neural feature dimension maps, salient stimuli will result in patterns of multivariate BOLD responses containing strong activation at the salient location that are strongest when defined by a region's preferred feature dimension (Fig. 1). Consistent with these predictions, we found that reconstructed spatial maps in color-selective regions indexed color-based salience, and motion-selective regions indexed motion-based salience, with each region preferentially representing salient locations based on their feature dimension.
Materials and Methods
Participants
Eight subjects recruited from the University of California–Santa Barbara (UCSB) community participated in the primary fMRI study (6 female, 18-27 years old). Pilot data (n = 3) confirmed that this sample size allowed for adequate power to detect our effects of interest (dz = 3.10). We opted to collect a large number of measurements from each subject to minimize within-subject variance, which often benefits statistical power more than increased sample sizes (Baker et al., 2021). All subjects reported normal or corrected-to-normal vision and did not report neurologic conditions. Procedures were approved by the UCSB Institutional Review Board (#2-20-0012). All subjects gave written informed consent before participating and were compensated for their time ($20/h for scanning sessions, $10/h for behavioral familiarization/training).
Stimuli and procedure
Participants performed a 30 min training session before scanning so that they were familiarized with the instructions. We used this session to establish the initial behavioral performance thresholds used in the first run of the scanning session. In the main task session, we scanned participants for a single 2 h period consisting of at least four mapping task runs, which we used to independently estimate encoding models for each voxel, and 8 experimental feature-salience task runs. All participants also underwent additional anatomic and retinotopic mapping scanning sessions (1-2 × 1.5-2 h sessions) to identify ROIs (see ROI definition). Additionally, most participants (n = 6) underwent an independent functional localizer session, which we used to verify that retinotopically defined ROIs were feature-selective.
Stimuli were presented using the Psychophysics toolbox (Brainard, 1997; Pelli, 1997) for MATLAB (The MathWorks). Visual stimuli were rear-projected onto a screen placed ∼110 cm from the participant's eyes at the head of the scanner bore using a contrast-linearized LCD projector (1920 × 1080, 60 Hz) during the scanning session. In the behavioral familiarization session, we presented stimuli on a contrast-linearized LCD monitor (2560 × 1440, 60 Hz) 62 cm from participants, who were seated in a dimmed room and positioned using a chin rest. For all sessions and tasks (main tasks, localizers, and mapping task), we presented stimuli on a neutral gray circular aperture (9.5° radius), surrounded by black (only aperture shown in Fig. 2).
Feature-salience task
For the main task (Fig. 2) and functional localizers (see below), participants attended a flashing cross within the fixation circle and ignored any other stimuli presented throughout the scanning session. This task localized goal-directed attention to fixation and was equivalent across all stimulus conditions, allowing us to isolate signals associated with bottom-up salience processing of our peripheral stimuli. Participants monitored the fixation cross throughout the whole run for any increase in length in either the vertical or horizontal bar of the cross and responded to changes with a button press (the left button for a horizontal target, the right button for a vertical target). The vertical and horizontal lines of the fixation cross were 0.25° of visual angle long and flickered at 3 Hz (10 frames on, 10 frames off at 60 Hz). Whenever the cross was visible, there was a 22.5% chance either line had a small change in length. When a change was detected, participants reported which line increased in length (horizontal or vertical). To ensure that participants maintained vigilant attention at fixation throughout the entire experiment, they performed an initial behavioral training session where they practiced the fixation task several times. Between runs of the practice session, we adjusted the degree of size change for the vertical/horizontal lines until they consistently achieved ∼80% accuracy. We further adjusted the difficulty of the fixation task between runs of the scanning session by altering the degree of size change for vertical/horizontal lines based on behavioral accuracy (range: 0.05°-0.125°). Participants performed the fixation task continuously throughout both stimulus presentation periods and intertrial intervals (ITIs) to ensure that salient events were temporally decoupled from fixation task performance and/or target detection. Feedback for each response to the fixation task was given via the aperture around fixation changing color for 0.5 s, with green representing a correct button press response, red representing an incorrect button press response, and yellow representing no response (missed a target). Feedback was given 1 s after a target was presented. A fixation target was never present for the first or last 2 s of a trial, or for 2 s after the presentation of a previous target.
The critical, ignored stimuli were either a color- or motion-defined salient location presented as a circular disk within random dot arrays spanning the entire stimulus aperture, except for a region around fixation (0.75°). On color trials, static dots within a disk were presented in the “opposite” color (hue, saturation, value [HSV] color space) compared with the background dots; on motion trials, moving black and white dots within a disk were presented in the opposite motion direction compared with the background dots. For example, if the dot array contained dots moving at 0° (to the right), the motion-defined salient location would contain dots moving at 180° (to the left). Similarly, if the dot array contained static colored dots with a red hue (H = 0°), then the color-defined salient location would contain dots with a green hue (H = 180°). Individual dots occupied 0.05° of visual angle, and dot density was 15 dots/deg2. In the motion array, dots moved at a speed of 9.5°/s in a randomly selected planar direction, and each dot was randomly colored black or white (100% contrast). Dots were randomly replotted every 50 ms or when they exceeded the stimulus bounds. For the color array, all dots remained static and were assigned a random hue value. Dot locations were updated every 333 ms. Both arrays updated every 333 ms during the 5 s presentation period, such that a new color or motion value was applied to every dot in the updated array 3 times per second. Trials started with the onset of the peripheral dot array while participants were attending fixation. The salient location appeared throughout the entire stimulus interval, centered 5° from fixation at a random location along an invisible ring from 0° to 359°, and had a radius of 1.5°. While the location of the salient stimulus remained constant on a given trial, it randomly varied between trials along the invisible ring.
We included three additional control conditions intermixed with salience-present trials. First, to ensure that spatially localized activation was due to the presence of the salient location, we presented colored static dots and moving black and white dots with no salient location defined (“salience-absent” trials). Second, as a positive control to ensure that our image reconstruction procedure was effective in each retinotopic ROI, we presented a flickering checkerboard disk (spatial frequency 0.679 cycles/°) on a gray background at the same size, eccentricity, and duration as the salient discs (“checkerboard” trials; similar to previous reports) (Sprague and Serences, 2013). The checkerboard stimulus flickered at a rate of 6 Hz and was considered to be feature-agnostic with respect to the key manipulations in the study (i.e., color/motion). All trials were separated by a randomly selected ITI ranging from 6 to 9 s with an average ITI of 7.5 s. Each run had 24 trials; and during each run, there were 6 trials of each salience-present condition (based on color, based on motion, checkerboard) and three trials each of the salience-absent color/motion conditions. Trial order was shuffled within run. Each run started with a 3 s blank period and ended with a 10.5 s blank period, for a total run duration of 313.5 s (for one participant, we acquired 416 TRs for one run instead of 418, resulting in 312 s of data for this run). Eye position was monitored throughout the experiment using an Eyelink 1000 eyetracker (SR Research).
Additionally, we performed a control eyetracking experiment (n = 10, 9 female, age 18-27 years old) outside of the scanner to ensure that our stimuli were sufficiently salient to capture attention as indexed by saccades directed to salient stimulus locations. Subjects viewed the same displays as the MRI version of the experiment with the following exceptions: each stimulus was presented for 1 s, ITIs were reduced to 3 s, each subject viewed a total of 180 stimuli (36 occurrences of each stimulus condition), and subjects were encouraged to freely view the display with no instructions to perform a fixation task (the fixation task stimulus appeared on-screen, but participants were never instructed to report aspects of the stimulus). The eyetracker sampled right eye gaze position at 500 Hz. Participants performed a 9 point calibration procedure before each run with a viewing distance of 58 cm while seated in a chin and forehead rest.
Spatial mapping task
We also acquired several runs of a spatial mapping task used to independently estimate a spatial encoding model for each voxel, following previous studies (Sprague and Serences, 2013; Sprague et al., 2016, 2018b). On each trial of the mapping task, we presented a flickering checkerboard at different positions selected from a hexagonal grid spanning the screen. Participants viewed these stimuli and responded whenever a rare contrast change occurred (10 of 47 trials, 21.3%), evenly split between contrast increments and decrements. The checkerboard stimulus was the same size as the salient locations in the feature salience task (1.5° radius) and was presented at 70% contrast and 6 Hz full-field flicker. All stimuli appeared within a gray circular aperture with a 9.5° radius, as in the feature-salience task. For each trial, the location of the stimulus was selected from a triangular grid of 37 possible locations with an added random uniform circular jitter (0.5° radius). The base position of the triangular grid was rotated by 30° on every other scanner run to increase spatial sampling density. As a result, every mapping trial was unique, which enabled robust spatial encoding model estimation.
Each trial started with a 3000 ms stimulus presentation period. If a target was present, then the stimulus would be dimmed/brightened for 500 ms with the stipulation that the contrast change would not occur in either the first or last 500 ms of the stimulus presentation period. Finally, there was an ITI ranging from 2 to 6 s (uniformly sampled using the linspace command in MATLAB [linspace(2, 6, 47)]). All target-present trials were discarded when estimating the spatial encoding model. Each run consisted of 47 trials (10 of which included targets). We also included a 3 s blank period at the beginning of the run and a 10.5 s blank period at the end of the run. Each run totaled 432 s.
Retinotopic mapping task
We used a previously reported task (Mackey et al., 2017) to identify retinotopic ROIs via the voxel receptive field (vRF) method (Dumoulin and Wandell, 2008). Each run of the retinotopy task required participants attend several random dot kinematograms (RDKs) within bars that would sweep across the visual field in 2.25 s (or, for one participant, 2.6 s) steps. Three equally sized bars were presented on each step, and the participants had to determine which of the two peripheral bars the motion in the central bar matched with a button press. Participants received feedback via a red or green color change at fixation. We used a three-down/one-up staircase to maintain ∼80% accuracy throughout each run so that participants would continue to attend the RDK bars. RDK bars swept 17.5° of the visual field. Bar width and sweep direction were pseudo-randomly selected from several different widths (ranging from 2.0° to 7.5°) and four directions (left-to-right, right-to-left, bottom-to-top, and top-to-bottom).
Functional localizer tasks
To independently identify color- and motion-selective voxels, we scanned participants while they performed three runs each of a color and motion localizer task (Bartels and Zeki, 2000; Huk et al., 2002) using a blocked design. During both tasks, the participant performed the same fixation task from the feature-salience attention task, where they monitored a central cross for changes in the size of the horizontal and vertical lines. In the color localizer task, participants viewed colored or grayscale rectangles of various sizes within the same aperture dimensions described in the spatial mapping task (see Spatial mapping task). Stimuli were presented spanning the entire aperture. Rectangle colors were individually sampled from the entire RGB color space (uniform independent random distribution of R, G, and B). Similarly, each grayscale rectangle had a randomly sampled contrast (identical R, G, and B value, randomly sampled for each rectangle). During the motion localizer, participants viewed either static or moving black and white dots. For the moving dots, motion could be clockwise, counterclockwise, or planar (20 evenly spaced steps from 18° to 360°). Dots were redrawn every 100 ms or when they exceeded the stimulus boundary. When the array contained planar motion, dots moved at 1.2°/s; if clockwise/counterclockwise motion, 0.6°/s. Within each block, each stimulus (static/motion or color/grayscale) was shown for 400 ms followed by a 100 ms blank period before the next stimulus presentation. Each block lasted 18 s (36 updates of stimulus feature values), with feature values randomly selected each presentation. During each scanning run, we presented 6 total blocks, alternating between grayscale rectangles (static dots) and colored rectangles (moving dots) for the color (motion) localizer runs. At the end of each run, participants viewed a blank screen while performing the fixation task for 18 s. Runs started with a 3 s blank period and ended with a 10.5 s blank period. There was no fixation task during the start and end blank periods. Each run lasted 229.5 s.
We acquired localizer data for 6 of 8 participants (the other 2 participants were unable to return to complete the localizer session).
fMRI acquisition
fMRI data acquisition and preprocessing pipelines in the current study closely followed a previous report (Hallenbeck et al., 2021) but with slight modifications. We acquired all functional and anatomic images at the UCSB Brain Imaging Center using a 3T Siemens Prisma scanner. fMRI scans for experimental, model estimation, retinotopic mapping, and functional localizers were acquired using the CMRR MultiBand Accelerated EPI pulse sequences. We acquired all images with the Siemens 64 channel head/neck coil with all elements enabled. We acquired both T1- and T2-weighted anatomic scans using the Siemens product MPRAGE and Turbo Spin-Echo sequences (both 3D) with 0.8 mm isotropic voxels, 256 × 240 mm slice FOV, and TE/TR of 2.24/2400 ms (T1w) and 564/3200 ms (T2w). We collected 192 and 224 slices for the T1w and T2w, respectively. We acquired three T1 images, which were aligned and averaged to improve signal-to-noise ratio.
For all functional scans, we used a Multiband (MB) 2D GE-EPI scanning sequence with MB factor of 4, acquiring 44 2.5 mm interleaved slices with no gap, isotropic voxel size 2.5 mm and TE/TR: 30/750 ms, and P-to-A phase encoded direction to measure BOLD contrast images. For retinotopic mapping of one participant (sub004), we used a MB 2D GE-EPI scanning sequence acquired 56 2 mm interleaved slices with isotropic voxel size 2 mm and TE/TR: 42/1300 ms. We measured field inhomogeneities by acquiring spin echo images with normal and reversed phase encoding (3 volumes each), using a 2D SE-EPI with readout matching that of the GE-EPI and the same number of slices, no slice acceleration, TE/TR: 45.6/3537 ms (TE/TR: 71.8/6690 ms for sub004's retinotopic mapping session).
MRI preprocessing
Our approach for preprocessing was to coregister all functional images to each participant's native anatomic space. First, we used all intensity-normalized high-resolution anatomic scans (3 T1 images and 1 T2 image for each participant) as input to the “hi-res” mode of Freesurfer's recon-all script (version 6.0) to identify pial and white matter surfaces. Processed anatomic data for each participant was used as the alignment target for all functional datasets which were kept within each participant's native space. We used AFNI's afni_proc.py to preprocess functional images, including motion correction (6-parameter affine transform), unwarping (using the forward/reverse phase-encode spin echo images), and coregistration (using the unwarped spin-echo images to compute alignment parameters to the anatomic target images). We projected data to the cortical surface, then back into volume space, which incurs a modest amount of smoothing perpendicular to the cortical surface. To optimize distortion correction, we divided functional sessions into 3-5 subsessions, which consisted of 1-4 fMRI runs and a pair of forward/reverse phase encode direction spin echo images each, which were used to compute that subsession's distortion correction field. For the feature salience and mapping task, we did not perform any spatial smoothing beyond the smoothing introduced by resampling during coregistration and motion correction. For retinotopic mapping and functional localizer scans, we smoothed data by 5 mm FWHM on the surface before projecting back into native volume space.
ROI definition
We identified 15 ROIs using independent retinotopic mapping data. We fit a vRF model for each voxel in the cortical surface (in volume space) using averaged and spatially smoothed (on the cortical surface; 5 mm FWHM) time series data across all retinotopy runs (8-12 per participant). We used a compressive spatial summation isotropic Gaussian model (Kay et al., 2013a; Mackey et al., 2017) as implemented in a customized, GPU-optimized version of mrVista (for detailed description of the model, see Mackey et al., 2017). High-resolution stimulus masks were created (270 × 270 pixels) to ensure similar predicted responses within each bar size across all visual field positions. Model fitting began with an initial high-density grid search, followed by subsequent nonlinear optimization. We visualized retinotopic maps by projecting vRF best-fit polar angle and eccentricity parameters with variance explained ≥10% onto each participant's inflated cortical surfaces via AFNI and SUMA (see Fig. 4). We drew retinotopic ROIs (V1, V2, V3, V3AB, hV4, LO1, LO2, VO1,
For primary analyses (see Figs. 4, 6), we aggregated across color-selective maps (hV4, VO1, VO2) and motion-selective maps (TO1, TO2) as reported in previous literature (Albright, 1993; Huk et al., 2002; Brewer et al., 2005; Conway et al., 2007; Amano et al., 2009; Mullen, 2019) by concatenating all voxels for which the best-fit vRF model explained at least 10% of the signal variance before univariate or multivariate analyses. For completeness, we also conducted all analyses for each individual ROI using the same voxel selection threshold (see Figs. 5, 7). We verified that our ROIs based on retinotopic mapping exhibited typical color- and motion-selective responses during our localizer tasks. The motion localizer revealed significant motion-related activation (permuted one-sample t test comparing activation in response to moving dots to activation in response to static dots) in retinotopic ROIs: TO1/TO2 (p < 0.001), hV4/VO1/VO2 (p = 0.011). The color localizer identified significant color-related activation (permuted one-sample t test comparing activation in response to colored rectangles to activation in response to grayscale rectangles) in color-selective retinotopic ROIS hV4/VO1/VO2 (permuted one-sample t test, p < 0.001), but not motion-selective retinotopic ROIs TO1/TO2 (p = 0.154).
IEM
We used a spatial IEM to reconstruct images based on stimulus-related activation patterns measured across entire ROIs (Sprague and Serences, 2013) (see Fig. 4A). To do this, we first estimated an encoding model, which describes the sensitivity profile over the relevant feature dimension for each voxel in a region. This requires using data set aside for this purpose, referred to as the “training set.” Here, we used data from the spatial mapping task as the independent training set. The encoding model across all voxels within a given region is then inverted to estimate a mapping used to transform novel activation patterns from a “test set” (runs from the feature salience task) and reconstruct the spatial representation of the stimulus at each time point.
We built an encoding model for spatial position based on a linear combination of 37 spatial filters (Sprague and Serences, 2013; Sprague et al., 2014, 2018b). Each voxel's response was modeled as a weighted sum of each identically shaped spatial filter arrayed in a triangular gird (see Fig. 4A). The centers of each filter were spaced by 2.83° and were Cosine functions raised to the seventh power as follows:
B1 (n trials × m voxels) in this equation is the measured fMRI activity of each voxel during the visuospatial mapping task, and W is a weight matrix (k channels × m voxels), which quantifies the contribution of each information channel to each voxel.
This is computed for each voxel within a region independently, making this step univariate. The resulting
Here,
Since stimuli in the feature-selective attention task were randomly positioned on every trial, we rotated the center position of spatial filters such that the resulting 2D reconstructions of the stimuli were aligned across trials and participants (see Fig. 4B). We then sorted trials based on condition (salience-present: color, salience-present: motion, checkerboard, salience-absent: color, salience-absent: motion). Finally, we averaged the 2D reconstructions across trials within the same condition for individual participants, then across all participants for our grand-average spatial reconstructions (see Fig. 4B,C). Individual values within the 2D reconstructed spatial maps correspond to visual field coordinates. To visualize feature selectivity within reconstructed spatial maps, we computed the difference in map activation between the salience-present: color and motion conditions (see Fig. 6A). We used these difference maps to assess whether feature-selective ROIs had the same feature preferences throughout the visual field, or if they were localized to the position of the salient stimulus when present.
Critically, because we reconstructed all trials from all conditions of the feature-selective attention task using an identical spatial encoding model estimated with an independent spatial mapping task, we can compare reconstructions across conditions on the same footing (Sprague et al., 2018a, 2018b, 2019). Moreover, because we were not interested in decoding precision, but instead in the activation profile across the entire reconstructed map, we did not use any feature decoding approaches and instead opted to directly characterize the resulting model-based reconstructions (e.g., correlation table) (Scotti et al., 2021). Finally, the resulting model-based reconstructions are necessarily based on the modeling choices used here and should not be used to infer any features of single-neuron tuning properties (which we do not claim in this report) (Sprague et al., 2018a, 2019). Should readers be interested in testing the impact these modeling choices have on results, all analysis code and data are freely available (see below).
Quantifying stimulus representations
To quantify the strength of stimulus representations within each reconstruction, we computed the mean map activation of pixels located within a 1.5° radius disk centered at the known position of each stimulus (matching the stimulus radius of 1.5°) (Sprague et al., 2018b). This provides a single value corresponding to the activation of the salient stimulus location for a given condition, within each retinotopic ROI. To assess the spatial selectivity of reconstructed spatial maps, we compared the mean map activation at the location of salient stimuli to map activation at the location opposite fixation using a 1.5° radius disk (see Fig. 4B,D). Previous studies using a similar IEM approach have used other methods to quantify stimulus reconstructions, such as “fidelity” (e.g., Sprague et al., 2016). Conclusions using fidelity in the current study were qualitatively and quantitatively consistent with mean map activation results, so we opted to quantify our findings with map activation as it is more intuitive.
To compare values across conditions, we computed a difference in extracted map activation across conditions (e.g., subtracted color map activation from motion map activation; see Fig. 6). As an exploratory analysis, we computed these differences at every pixel in reconstructed maps. For quantification, we focused on activation averaged over the discs aligned to the salient stimulus location within each map or the opposite location (see Fig. 6C).
Visualizing and quantifying stimulus salience using gaze position
To ensure that our stimuli were able to capture attention in the absence of an instructed fixation task, we analyzed the eye position data from the salience control experiment. We first generated gaze heatmaps for each of the salience conditions using gaze fixation data extracted using a velocity threshold of 22°/s and an acceleration threshold of 3800°/s2s. We plotted the x and y positions of each fixation, rotated fixations based on the known location of the salient stimulus on each trial, and then smoothed the maps with a 2D Gaussian kernel using the MATLAB function imgaussfilt (Fig. 3A; kernel σ = 0.33°). Fixations that were within 2° of the central fixation point were excluded from the heatmaps. A behavioral index of stimulus salience was quantified by computing the proportion of first fixations on each trial that landed at the salient stimulus location (1.5° radius disk at 5.0° eccentricity) to fixations to the opposite location (1.5° radius disk at −5.0° eccentricity, where 0° was center; Fig. 3B).
Statistical analysis
We used parametric statistical tests for all comparisons (repeated-measures ANOVAs and t tests). To account for possible non-normalities in our data, we generated null distributions for each test using a permutation procedure (see below) to derive p values.
First, we used a one-way repeated-measures ANOVA (factor: stimulus condition; five levels: motion salience, color salience, checkerboard, nonsalient motion, and nonsalient color) to determine whether behavioral performance on the fixation task depended on the type of ignored peripheral stimulus presented on each trial. All behavioral analyses only used fixation target trials that occurred during the stimulus presentation period, as this was the trial period of interest in neuroimaging analyses. For the salience control experiment (Fig. 3), we compared the proportion of first fixations with a two-way repeated-measures ANOVA with factors of stimulus condition (three levels: motion salience, color salience, checkerboard) and location (two levels: salient location, opposite location). To confirm that the salient location captured attention, we then performed follow-up paired-samples t tests between proportion of fixations to the aligned salient and opposite locations (Fig. 3B). The sum of first fixations across salient and opposite locations does not sum to 1 because participants could fixate other locations on the screen.
For all primary fMRI analyses, we focused on sets of retinotopically defined regions selected a priori based on previous reports establishing feature selectivity for color or motion (see above). Additionally, for completeness, we repeated these tests across each individual retinotopic ROI (see Figs. 5, 7). To determine the spatial selectivity of reconstructed spatial maps based on fMRI activation patterns, we computed a three-way repeated-measures ANOVA to determine the spatial selectivity of neural modulations with location activation (salient location, opposite location, and aligned position in salience-absent conditions), ROI, and feature (motion/color) as factors (Figs. 4D, 5D). To directly test whether feature-selective ROIs represent salient locations more strongly when salience is defined by their preferred feature value, we computed a paired-samples t test on the difference between map activation on color-salience and motion-salience trials between color-selective and motion-selective ROIs (Fig. 6). We compared the same difference across all individual ROIs using a one-way repeated-measures ANOVA with ROI as factor (Fig. 7C). Finally, we assessed the spatial selectivity of feature-selective responses by computing a two-way ANOVA with ROI and location (salient location, location opposite to salient stimulus, and aligned position in salience-absent condition) as factors (Figs. 6B, 7B).
For our shuffling procedure, we used a random number generator that was seeded with a single value for all analyses. The seed number was randomly selected using an online random number generator (https://numbergenerator.org/random-8-digit-number-generator). Within each participant, averaged data within each condition were shuffled across conditions for each participant individually, and once shuffled, the statistical test of interest was recomputed over 1000 iterations. p values were derived by computing the percentage of shuffled test statistics that were greater than or equal to the measured test statistic. We controlled for multiple comparisons using the false discovery rate (Benjamini and Yekutieli, 2001) across all comparisons within an analysis when necessary. Error bars indicate SE, unless noted otherwise.
Data and code availability
All data supporting the conclusions of this report and all associated analysis scripts are available on Open Science Framework (https://osf.io/wkb67/). To protect participant privacy, and in accordance with Institutional Review Board-approved procedures, freely available data are limited to extracted time series for each voxel of each ROI for all scans of the study. Whole-brain “raw” data will be made available from the authors on reasonable request.
Results
Behavior
Participants continuously monitored a central fixation cross for brief changes in line segment length while viewing stimuli in the periphery (Fig. 2). Stimuli could either be full-screen arrays of colored static or grayscale moving dots, or flickering checkerboards. When stimuli were dot arrays, they typically contained a salient region (colored dots: a disk appeared in a different color, 180° away in HSV color space; moving dots: a disk contained dots moving in the opposing motion direction). This design requires attention to be maintained at fixation, and allows for bottom-up salience to be isolated and evaluated in response to the ignored, task-irrelevant peripheral stimulus as a function of the salience-defining feature. Across runs, we adjusted fixation task difficulty to keep behavioral performance above chance and below ceiling and to maximize participant engagement (average response accuracy across conditions: 81.07 ± 2.13%, mean ± SEM across participants; average miss rate across conditions: 26.49 ± 9.12%). Importantly, we observed no difference in response accuracy (p = 0.7; one-way permuted repeated-measures ANOVA) or miss rate (p = 0.29; one-way permuted repeated-measures ANOVA) as a function of peripheral stimulus type. Thus, any differences observed in multivariate activation patterns between conditions cannot be driven by differences in behavioral performance.
Feature-salience task. A, On each fMRI scanning run, participants continuously performed an attention-demanding task at fixation where they reported changes in length of either the horizontal or vertical bar of the fixation cross. While attention was directed to the demanding fixation task, we measured how feature-selective retinotopic ROIs encode task-irrelevant salient stimulus locations by presenting various types of visual stimuli. On most stimulus presentation trials, the visual stimulus consisted of dots spanning the entire screen. The dot stimuli could either be presented as static colored dots, or grayscale (black/white) moving dots. Subjects received feedback after each target presentation via color changes in the ring around fixation (green represents correct button press; red represents incorrect button press; yellow represents no button press). B, The features of all dots were updated at 3 Hz such that, on average, the overall feature value (color/motion) presented across each trial was neutral. For example, on “color” trials, the color and location of each static dot were updated every 333 ms with a new randomly selected hue and randomly drawn location. On most dot array trials (66.6%; “salience-present”), a circular portion of the stimulus display was made salient by presenting dots in the opposite feature value as presented in the background. For example, during a 333 ms period, if the background dots were moving at 45°, the salient foreground dots would be moving 225°. These salient stimulus regions were never relevant for participant behavior, and the challenging fixation task ensured attention was withdrawn from peripheral stimuli. After stimulus presentation, there was a 6-9 s blank ITI during which time the fixation task continued. Example trial for each condition shown. C, As control conditions, we also included trials with the salient location defined by a flickering checkerboard (6 Hz full-field flicker) on a blank background, and trials with colored static or moving black and white dots with no salient location. D, On salience-present trials, the salient stimulus was 1.5° in radius, and was presented at a location randomly chosen from an invisible ring centered 5° from fixation.
To verify that our stimuli were behaviorally salient in the absence of a demanding fixation task, we acquired eyetracking data outside the scanner from naive participants who were encouraged to freely view the display while viewing the same dot arrays that appeared in the MRI version of the task. The first fixation after the appearance of the stimulus array was most commonly directed to the salient location (Fig. 3). A two-way permuted repeated-measures ANOVA with salience condition (salient motion, salient color, and checkerboard) and activation location (salient location and opposite location) as factors showed only a significant main effect of location (p < 0.001). Follow-up comparisons show that the first fixation of each trial was more likely to be directed to the salient location than the opposite location for the salient motion (p = 0.006), color (p = 0.002), and checkerboard (p < 0.001) conditions. These results verify that our stimuli were sufficiently salient to capture attention, although no behavioral differences were observed in the demanding fixation task conducted during the MRI sessions.
Salient locations in task stimuli are fixated during free-viewing (salience control experiment). A, Participants viewed the same stimuli that were presented in the MRI version of the experiment with slight modifications (see Materials and Methods). We plotted the first fixation from each trial of the salient stimulus conditions (salient motion, salient color, and checkerboard). We rotated fixation coordinates on each trial based on the known salient stimulus location (all trials aligned such that the stimulus is at the 0° position). We generated heatmaps by smoothing the aligned 2D fixation histogram (summed across trials) with a 2D Gaussian after removing eye movements near fixation (using MATLAB function imgaussfilt: kernel σ = 0.33°). Fixations within 2° radius of the screen center were excluded from the heatmaps. B, We quantified fixation heatmaps by computing the proportion of first fixations directed to the aligned stimulus location and compared this value to the proportion of fixations to the opposite location. Across conditions, the salient stimulus was fixated more than the opposite location. Errors bars indicate SEM across participants. *Significant difference based on permuted paired-samples t test (p < 0.001).
Multivariate spatial representations
Next, we used a spatial IEM to reconstruct spatial maps based on measured activation patterns from each ROI on each trial. We used data from an independent “mapping” task (see Materials and Methods) (Sprague et al., 2018a) to estimate a spatial encoding model for each voxel parameterized as a set of weights on smooth, overlapping spatial channels. Then, we inverted the set of encoding models across all voxels in each cluster of regions to reconstruct spatial maps based on activation profiles from the feature-salience task (Fig. 4A). This procedure generates a reconstructed image for each time point, which we then averaged within condition and across timepoints corresponding to 5-8 s after stimulus onset. The resulting images are well established to show strong activation at locations corresponding to visual stimulation (e.g., where a checkerboard was presented) (Sprague and Serences, 2013; Sprague et al., 2018b, 2019). Indeed, when the “checkerboard” trials were used for stimulus reconstruction, we observed strong representations of the salient stimulus location in all ROIs (Fig. 4B).
Reconstructed spatial maps track salient stimulus location. A, We estimated a spatial IEM for each ROI using an independent spatial mapping task (for details, see Materials and Methods). Using this spatial encoding model, which maps activation patterns to activation of spatial channels that can be summed to produce reconstructed spatial maps, we were able to generate image reconstructions of the visual field on each trial and directly compare map activation across conditions. For each condition, we averaged trial-wise reconstructions computed using activation patterns from 5 to 8 s after stimulus onset after we rotated and aligned them to the known position of the salient stimulus, if present. B, To validate the utility of our method, we computed reconstructions of checkerboard trials from each ROI. Qualitatively, there was a strong response to the checkerboard stimulus a cross both aggregate ROIs. To quantify reconstructions, we computed the mean map activation at the aligned stimulus location and at the location on the opposite side of fixation within each ROI. In each ROI's reconstruction, activation at the stimulus location was greater than at the opposite location. Two-way permuted repeated-measures ANOVA (ROI; location) identified a significant main effect of location (p < 0.001). C, Qualitatively, the salient location was highlighted in the aggregate motion ROI TO1/TO2 when the salient location was defined by motion, but not by color, with the converse result observed in the aggregate color ROI hV4/VO1/VO2. On salience-absent trials, no location is reliably highlighted in average reconstructions. D, Using data from each stimulus condition, we identified whether enhanced reconstruction responses were localized to the salient stimulus by comparing mean salient location activation (Sal) to the mean activation of the position opposite of the salient location (Opp), as well as the mean activation of the “aligned” position of the salience-absent condition (No-sal). On trials with motion-defined stimuli, activation in the motion-selective ROI was greatest at the location of the salient motion stimulus. When stimuli were defined by static colorful dots, activation in the color-selective ROI was greatest at the location of the salient color stimulus. A-D, Error bars indicate SEM across participants (n = 8). *Significant difference based on permuted paired t test, corrected for multiple comparisons with FDR. Three-way permuted repeated-measures ANOVA (ROI; location; stimulus feature) identified a significant main effect of location (p < 0.001), two-way interaction between feature and ROI (p = 0.007), and a three-way interaction (p = 0.001). For all statistical comparisons, see Table 1.
Having validated our method, we asked: does activation within these reconstructions additionally track locations made salient based on local differences in feature values? In our study, the entire visual field is equivalently stimulated (e.g., equal amount of motion energy or density of colored dots), so any nonuniform activation must be because of salience-related activation patterns within each ROI. Indeed, model-based reconstructions were able to track the salient location throughout the visual field on both motion- and color-salient trials (Fig. 4C). When the salient location was defined by dots moving in a different motion direction, the reconstructed spatial map from TO1/TO2 showed a stronger representation of the salient location than when the salient location was defined by static dots presented in a different color, and the converse result is apparent when examining spatial maps reconstructed from hV4/VO1/VO2 (Fig. 4C). Critically, in these trials, the feature values at each location are updated at 3 Hz, minimizing the possibility that reconstructions of salient locations emerge from a serendipitous selection of a specific local feature value.
To quantify the condition-specific stimulus representation within model-based reconstructions for each ROI, we computed the mean activation at the known position of the salient stimulus (see Sprague et al., 2018a) (Fig. 4B). For ROIs which compute spatial maps of salient location(s), we predict that reconstructions will show an enhanced representation of the salient stimulus position compared with nonsalient locations. Our design allows for two important comparisons to establish whether these salience computations occur. First, we can directly compare the activation in reconstructed spatial maps at the salient location to activation in the location on the opposite side of the fixation point (which contains nonsalient “background” dots with an equal amount of color/motion energy as the salient location). This comparison allows us to demonstrate that spatial maps highlight salient locations within each salient stimulus condition. Second, we can compare the mean activation of the salient location on salience-present trials to a randomly selected location of the reconstructed spatial map on salience-absent trials. This allows us to see whether the map activation at the salient location was greater than map activation at an equivalent spatial location when viewing a uniform dot array with no salient position(s).
Map activation values were strongest at the salient location when a salient stimulus was presented, and weaker at nonsalient locations (both when a salient stimulus was presented elsewhere and when no salient stimulus was present at all; Fig. 4D). We compared map activation values across conditions, map locations, and ROIs using a three-way repeated-measures permuted ANOVA with stimulus feature (motion/color), activation location (salience-absent; salience-present: salient location; salience-present: opposite location), and ROI (TO1/TO2; hV4/VO1/VO2) as factors. This analysis indicated that there was a main effect of activation location (p < 0.001), a two-way interaction between stimulus feature and ROI (p = 0.007), and a three-way interaction between all three factors (p = 0.001). All other comparisons were nonsignificant (p > 0.05).
Within hV4/VO1/VO2, we observed a significant difference between map activation at the salient location and opposite location on color salience-present trials (p < 0.001, permuted paired-samples t test; Fig. 4D) as well as a significant difference between the color-defined salient location and map activation on salience-absent colored dot trials (p = 0.008, permuted paired-samples t test). Map activation at the salient location was significantly >0 (p < 0.001; permuted one-sample t test).
We found complementary results in TO1/TO2 when using data from the motion stimulus conditions (Fig. 4D). We observed a significant difference in map activation between the salient location defined by motion and the opposite location on salience-present trials (p < 0.001, permuted paired-samples t test) in addition to a significant difference between the map activation at the salient location and map activation from similar locations on trials when no salient location was defined (p < 0.001, permuted paired-samples t test). Map activation at salient locations defined by motion was also >0 (p < 0.001; permuted one-sample t test). Together, these results suggest that activation patterns in these regions reflect the image-computable salience of the corresponding location in the visual field.
Comparing salience computations across feature dimensions
Thus far, we have shown that color- and motion-selective regions each compute a representation of the location of a salient stimulus defined by feature contrast. If these regions act as neural dimension maps which each individually compute representations of salient locations defined by their preferred feature value, we expect to observe a more efficient extraction of salient locations when the salience-defining feature matches the region's preferred feature value. We tested this by computing a pixelwise difference between reconstructed spatial maps from each ROI when salience was defined based on color and when salience was defined based on motion (Fig. 6A), along with the same difference between the salience-absent control conditions. Values near zero (white) indicate that the map activation is equal between stimulus features, while positive (red)/negative (blue) values near the salient location indicate that the map preferentially extracts salient locations when the salience defining feature is motion or color, respectively. If each ROI selectively identifies salient locations based on its preferred feature value, then these difference maps will show greater absolute differences at the salient location than other locations. However, if instead feature selectivity and salience computations each independently and additively impact spatial maps, then these difference maps should show no spatial structure (particularly, no difference between salient and nonsalient locations).
Qualitatively comparing these difference maps (Fig. 6A), it is apparent that both ROIs show a stronger difference at the salient location than the opposite location, consistent with a local and specialized computation of salient locations based on each region's preferred feature values. The localized differences at the salient location are in opposite directions (TO1/TO2: positive/red; hV4/VO1/VO2: negative/blue), as expected if motion (color)-selective ROIs more efficiently extract salient locations defined by motion (color) feature contrast than color (motion) feature contrast.
We quantified the degree to which each ROI selectively computes salience based on its preferred feature value by extracting activation values from these difference maps at the salient location, the opposite location, and the “aligned” location in the salience-absent condition and computing map activation difference scores based on the regions' feature preferences (TO1/TO2: motion – color; hV4/VO1/VO2: color – motion; Fig. 6B). A two-way repeated-measures ANOVA with location (salient location, opposite location, salience-absent) and ROI as factors revealed a significant main effect of location (p = 0.002). Follow-up comparisons in TO1/TO2 demonstrate that map activation was greater at the salient location than the opposite location (p = 0.016, permuted paired-samples t test) and the salience-absent location (p < 0.001, permuted paired-samples t test) and was the only location with an activation difference >0 (p = 0.005, permuted one-sample t test). In hV4/VO1/VO2, the same tests did not reveal significant differences between the salient location and the opposite location (p = 0.074, permuted paired-samples t test) or the salient-absent position (p = 0.556, permuted paired-samples t test). However, only the salient location had map activation differences >0 after FDR corrections (p = 0.001, permuted paired-samples t test). Together, these results suggest that each ROI selectively indexes a salient location when salience is defined based on its preferred feature value.
Finally, to directly compare salience computations between regions, we computed a salience modulation index. This was defined as the difference between map activation at the salient location between the motion and color conditions, where positive values indicate a stronger response to the motion-defined salient location, negative values indicate a stronger response to the color-defined salient location, and zero indicates no difference between conditions. Salience modulation index reliably differed between motion- and color-selective ROIs (Fig. 6C; p < 0.001; permuted paired-samples t test). This indicates that feature-selective ROIs preferentially compute salience based on their preferred feature dimension, and further supports the proposal that these retinotopically defined regions act as neural dimension maps within priority map theory.
Discussion
In the present study, our goal was to determine whether visual cortex computes spatial maps representing salient locations based on specific feature dimensions within feature-selective retinotopic regions (Fig. 1), a key prediction of priority map theory. We probed this question by reconstructing spatial maps based on fMRI activation patterns measured while participants viewed, but ignored, stimuli containing salient locations based on different feature dimensions (Fig. 2). Our results show that salient location representations in color-selective regions hV4/VO1/VO2 and motion-selective regions TO1/TO2 are modulated by bottom-up feature salience, even when top-down attention is kept at fixation (Fig. 4). Representations were selectively enhanced when the salience-defining feature matched the preferred feature of a given region and these enhancements occurred at the salient location (Fig. 6). These results provide strong evidence that these retinotopic cortical regions act as “neural feature dimension maps,” confirming an important aspect of priority map theory.
In previous studies in humans and nonhuman primates, neural correlates of salience and/or priority maps have been identified in several regions, including the following: LGN (Kastner et al., 2006; Poltoratski et al., 2017), V1 (Li, 2002; Zhang et al., 2012; Poltoratski et al., 2017; Wang et al., 2022), extrastriate visual cortex (including V4/hV4) (Mazer and Gallant, 2003; Burrows and Moore, 2009; Bogler et al., 2011, 2013; Sprague and Serences, 2013; Poltoratski et al., 2017; Sprague et al., 2018b; Adam and Serences, 2021), lateral intraparietal/intraparietal sulcus (Gottlieb et al., 1998; Bisley and Goldberg, 2003, 2006; Jerde et al., 2012; Sprague et al., 2018b; Chen et al., 2020; Adam and Serences, 2021), frontal eye field (Schall and Hanes, 1993; Schall et al., 1995; Bichot and Schall, 1999), substantia nigra (Basso and Wurtz, 2002), pulvinar (Shipp, 2003), and superior colliculus (Basso and Wurtz, 1998; Fecteau and Munoz, 2006; White et al., 2017). Across these studies, activity in neurons/voxels tuned for salient and/or relevant locations is greater than activity in neurons/voxels tuned toward nonsalient and/or nonrelevant locations. However, many of these previous studies are limited by focusing on a single salience-defining feature and/or by relying on sparse single-unit recordings from one or a handful of cells within one or a small number of brain regions in nonhuman primates. Such studies necessarily face difficulty comparing across different brain regions and assessing activation profiles across the entire region (and thus, the entire visual field). Here, we overcame these limitations by implementing a multivariate IEM, which allowed us to reconstruct activation profiles across a map of the entire visual field from each time point's measured activation pattern in each ROI (Fig. 3A). Additionally, by manipulating the salience-defining stimulus feature dimension across trials (Fig. 2) while simultaneously measuring fMRI activation patterns across multiple feature-selective ROIs (Figs. 4C,D, 5C,D), we established that the region best representing a salient location depends on the salience-defining feature (Figs. 6, 7).
Reconstructed spatial maps track salient stimulus location across all retinotopic ROIs. A, Reconstructions of checkerboard stimuli from each individual retinotopic ROI. Qualitatively, there was a strong response to the checkerboard stimulus across all ROIs. B, We quantified reconstructions by computing the mean map activation at the aligned checkerboard stimulus location and at the location on the opposite side of fixation within each ROI. Permuted two-way repeated-measures ANOVA (ROI and location) showed a significant main effect of ROI (p < 0.001), location (p < 0.001), and interaction (p < 0.001). Map activation was stronger at the location of the checkerboard compared with the opposite location in all ROIs. *Significant difference based on permuted paired t test, corrected for multiple comparisons with FDR. C, Reconstructions were computed for the salience-present and -absent conditions for both features (color/motion; as in Fig. 4). Qualitatively, the salient location was highlighted in feature-selective ROIs when the salient location was defined by motion, but not by color, with the converse result observed in color-selective ROIs. On salience-absent trials, no location is reliably highlighted in average reconstructions. D, Using data from each stimulus condition, we identified whether enhanced reconstruction responses were localized to the salient location by comparing mean salient location activation to the mean activation of the position opposite of the salient location, as well as the mean activation of the “aligned” position of the salience-absent condition. On trials with motion-defined stimuli, activation in motion-selective ROIs was greatest at the location of the salient motion patch. When stimuli were defined by static colorful dots, activation in the color-selective ROIs was greatest at the location of the salient color stimulus. *Significant difference between salient and opposite locations. +Significant difference between the salient location and salience-absent reconstructions. Both sets of tests are based on permuted paired t test and corrected for multiple comparisons with FDR. Three-way permuted repeated-measures ANOVA (ROI; location; stimulus feature) identified a significant main effect of location (p < 0.001), two-way interaction between feature and ROI (p < 0.001), and a three-way interaction (p < 0.001). Individual feature-selective ROIs are highlighted (red represents color-selective regions; blue represents motion-selective regions). For all statistical comparisons, see Table 1.
Neural dimension maps selectively index salience based on their preferred feature. A, We directly compared reconstructed spatial maps for each ROI between trials where the salient location was defined by motion to those where the salient location was defined by color by computing their pixelwise difference (motion – color). For comparison, we also computed these maps for salience-absent trials. Positive values indicate a region more strongly represents a location based on motion stimuli over color stimuli, and negative values indicate the opposite. B, To compare feature selectivity across spatial locations and salience-presence conditions, we extracted values of each ROI's difference map computed between the preferred and nonpreferred feature dimension for that ROI (see diagrams). Difference map activation was more positive (preferred > nonpreferred) at the salient location than the opposite location on salience-present trials, and more positive than a random location on salience-absent trials (two-way permuted repeated-measures ANOVA with factors of location and ROI; significant main effect of location p = 0.002). Additionally, difference map values were only reliably >0 at the aligned position on salience-present trials (one-sample t tests against zero, FDR-corrected, p ≤ 0.005), indicating that these ROIs preferentially encode salient locations based on their preferred feature dimension. *Significant difference from zero. C, Difference map activation (A) computed at the salient location reliably differed between ROIs, such that motion-selective TO1/TO2 indexed salience more strongly when it was defined by motion than by color, and vice versa for color-selective hV4/VO1/VO2. *Significant difference based on permuted paired-samples t test (p < 0.001). Error bars indicate SEM across participants.
Individual feature-selective ROIs index salience based on their preferred feature. A, We directly compared reconstructed spatial maps for each ROI between trials where the salient location was defined by motion to those where the salient location was defined by color by computing the pixelwise difference (motion – color). For comparison, we also computed these maps for salience-absent trials. Positive values indicate a region more strongly represents a salient location based on motion over color, and negative values indicate the opposite. Data presented as in Figure 6A. B, To compare feature selectivity across spatial locations and salience-presence conditions, we extracted values of each ROI's difference map at the salient location, opposite location, and the salience-absent condition. Absolute difference map activation was greater at the salient location than the opposite location on salience-present trials than at a random location on salience-absent trials, and this effect depended on ROI (two-way permuted repeated-measures ANOVA with factors of location and ROI; significant main effect of ROI p < 0.001 and interaction p < 0.001). C, Difference map activation (A) computed at the salient location was only reliably different from 0 in feature-selective ROIs, such that motion-selective regions indexed salience more strongly when it was defined by motion than by color, and vice versa for color-selective regions. *Significant difference based on permuted one-sample t test (p < 0.05). *Significant difference from zero. Error bars indicate SEM across participants. Individual feature-selective ROIs are highlighted (red represents color-selective regions; blue represents motion-selective regions). For all statistical tests, see Table 2.
As mentioned above, there is extensive, and often conflicting, evidence for priority maps implemented in different structures throughout the brain. With the seemingly redundant computations of priority across regions, how is information across maps ultimately leveraged to guide attention? We expect measurements of feature dimension maps like those identified in the current study can be used to disentangle these various accounts by establishing which regions act to integrate information about salient locations based on combinations of features. One testable prediction of the priority map framework is that activation profiles in a feature-agnostic priority map should reflect some computation over the activation profiles measured across individual feature dimension maps, such as linear combination (Wolfe, 1994), winner-take-all processes (Itti and Koch, 2001), and probabilistic integration (Eckstein, 2017). While such a test is not possible in our current study, future work incorporating stimuli with multiple salient locations with different degrees of stimulus-defined salience may better disentangle the roles of various putative priority maps in guiding visual attention based on stimulus properties.
Aspects of priority map theory are invoked by foundational models of behavioral performance on visual search tasks (Duncan and Humphreys, 1989; Folk et al., 1992; Müller et al., 1995; Theeuwes, 2010). For example, when a subject is tasked with searching for one shape among a homogeneous array of other shapes (e.g., a circle among squares), search performance is slower if one of the distracting stimuli is presented in a different color (Theeuwes, 1992), and slowing is greater for larger target/distractor color discrepancies (Duncan and Humphreys, 1989; Theeuwes, 1992; Wolfe and Horowitz, 2017). Influential cognitive models posit that the distractor results in greater activation in a priority map, which slows search for the shape-defined target stimulus. Indeed, reconstructed spatial representations of target and distractor stimuli measured from extrastriate visual and parietal cortex during an adapted version of this task show enhanced neural representations for color-defined distracting stimuli (Adam and Serences, 2021). While cognitive priority map models have offered parsimonious explanations of changes in discrimination performance, response times, and gaze trajectories, they often disagree on when and how salient items capture attention (Luck et al., 2021).
One central issue limiting the ability to adjudicate among competing models is that there is no well-established method for quantitatively measuring how neural activity indexes the relative salience of different aspects of stimulus displays (Chang et al., 2021; Gaspelin and Luck, 2021; Pearson et al., 2021). Our findings may offer a practical solution to this challenge. Here, we demonstrate that neural representations of feature-based salience can simultaneously be tracked across multiple feature-selective regions, and previous work has identified similar salience representations using stimuli varying in luminance contrast across multiple retinotopic regions (Sprague et al., 2018a). Together, this work establishes a framework for empirically estimating neural representations of visual salience across cortical processing stages, including feature dimension maps. Using these methods, future work can investigate how competing stimuli made salient by different feature dimensions are represented within retinotopic maps and how the relative strength of those representations, and how they interact with one another within and between maps, may explain aspects of how salient stimuli capture attention measured using behavioral methods.
An example of a cognitive model that can be informed by these neural observations is the “attentional window” account of attentional capture (Theeuwes, 2010, 2023a,b). This model hypothesizes that difficult tasks requiring attention directed to a narrow region of the screen result in less attentional capture by salient distracting stimuli appearing outside the attended “window,” because the distracting stimuli are not processed to a sufficient degree to guide attention (compared with when attention is directed to a larger area of the screen, which allows for the distracting stimuli to capture attention). Our study offers an important new constraint on this model: even when attention was narrowly directed to a challenging task at fixation, and when no behavioral impacts of the salient peripheral stimuli could be observed inside the scanner, we were able to identify representations of salient locations in feature dimension maps (Figs. 4, 6). Thus, if variations in the size of an attentional window account for differences in attentional capture between task designs, we predict this window is likely to operate at a later stage, after feature-specific salience is computed in neural feature dimension maps.
Our results show that feature-selective retinotopic ROIs compute a stronger representation of the salient location defined based on their preferred feature dimension compared with a nonpreferred feature dimension. However, each ROI still represents the salient location, even when made salient by the nonpreferred feature value (Fig. 4). We speculate that this is because of feedback from higher-order regions (e.g., parietal or frontal cortex) that aggregate salience maps across individual feature dimensions to guide attention to important locations in the scene. Because the observers' task inside the scanner required careful fixation and the stimulus was always irrelevant, such automatic extraction of salient locations was never used by the participant to guide covert or overt attention (though overt attention was guided to these locations when participants were allowed to free-view, Fig. 3). However, it may be the case that the automatic identification of salient scene locations results in feedback signals across retinotopic cortex, similar to widespread retinotopic effects of cued spatial attention observed previously (e.g., Tootell et al., 1998; Gandhi et al., 1999; Sprague and Serences, 2013; Sprague et al., 2018b; Itthipuripat et al., 2019). Indeed, reconstructions based on parietal cortex activation patterns show representations of the salient location, as do those from all other retinotopic regions studied (Fig. 5C,D). Importantly, only feature-selective regions TO1/TO2 and hV4/VO1/VO2 show a systematic change in the representation of the salient location as a function of the salience-defining feature, supporting their role as neural feature dimension maps despite their weaker representation of a salient location based on a nonpreferred feature (Fig. 7).
While this study provides evidence that specialized computations support the identification of salient locations based on different feature values, there remain some important limitations to this work. First, to maximize our ability to detect representations of salient locations in this study, we used stimuli of fixed size but random location defined by 100% feature contrast (color: opposite hues in HSV color space; motion: opposite motion directions). Future research which parametrically manipulates the size, number, and feature contrast of salient stimulus locations in similar stimulus displays (e.g., Burrows and Moore, 2009; Zhang et al., 2012; Bogler et al., 2013) could enable both comparison of reconstructed spatial maps across various levels of stimulus salience and in-depth forward modeling of salience-related computations and their associated nonlinearities based on local feature contrast input to each voxel's receptive field (e.g., Kay et al., 2013b; Yildirim et al., 2018; Hughes et al., 2019). Second, future studies are required to test whether activation profiles in these neural feature dimension maps are equivalently sensitive to screen regions made salient by increases and decreases in feature intensity (e.g., motion speed, color saturation), which is a manipulation that has previously been effectively used to dissociate location-specific activation driven by stimulus intensity and local salience computations (Betz et al., 2013).
In conclusion, we found that feature-selective retinotopic ROIs compute maps of stimulus salience primarily based on feature contrast within their preferred feature dimension, confirming a key untested prediction of priority map theory. These results identify feature-selective retinotopic regions as the neural correlates of feature dimension maps within the priority map framework and support a new approach for probing the neural computations supporting visual cognition.
Footnotes
This work was supported by an Alfred P. Sloan Research Fellowship, an Nvidia Hardware Grant, a UC Santa Barbara Academic Senate Research Grant, and US Army Research Office Cooperative Agreement W911NF-19-2-0026 for the Institute for Collaborative Biotechnologies. We thank Kirsten Adam and John Serences for helpful comments on a draft of the manuscript.
The authors declare no competing financial interests.
- Correspondence should be addressed to Daniel D. Thayer at danielthayer{at}ucsb.edu or Thomas C. Sprague at tsprague{at}ucsb.edu