Abstract
Although bottom-up attention can improve visual performance with and without awareness to the exogenous cue, whether they are governed by a common neural computation remains unclear. Using a modified Posner paradigm with backward masking, we found that the cueing effect displayed a monotonic gradient profile (Gaussian-like), both with and without awareness, whose scope, however, was significantly wider with than without awareness. This awareness-dependent scope offered us a unique opportunity to change the relative size of the attention field to the stimulus, differentially modulating the gain of attentional selection, as proposed by the normalization model of attention. Therefore, for each human subject (male and female), the stimulus size was manipulated as their respective mean attention fields with and without awareness while stimulus contrast was varied in a spatial cueing task. By measuring the gain pattern of contrast-response functions on the spatial cueing effect derived by visible or invisible cues, we observed changes in the cueing effect consonant with changes in contrast gain for visible cues and response gain for invisible cues. Importantly, a complementary analysis confirmed that subjects' awareness-dependent attention fields can be simulated by using the normalization model of attention. Together, our findings indicate an awareness-dependent normalization framework of visual bottom-up attention, placing a necessary constraint, namely, awareness, on our understanding of the neural computations underlying visual attention.
SIGNIFICANCE STATEMENT Bottom-up attention is known to improve visual performance with and without awareness. We discovered that manipulating subjects' awareness can modulate their attention fields of visual bottom-up attention, which offers a unique opportunity to regulate its normalization processes. On the one hand, by measuring the gain pattern of contrast-response functions on the spatial cueing effect derived by visible or invisible cues, we observed changes in the cueing effect consonant with changes in contrast gain for visible cues and response gain for invisible cues. On the other hand, by using the normalization model of attention, subjects' awareness-dependent attention fields can be simulated successfully. Our study supports important predictions of the normalization model of visual bottom-up attention and further reveals its dependence on awareness.
Introduction
Covert attention, the selective processing of visual information at a given location in the absence of eye movements, can be attracted automatically by an exogenous cue, known as visual bottom-up attention. Numerous studies have demonstrated that bottom-up attention can improve visual performance with (Carrasco, 2011) and without awareness to the exogenous cue in various paradigms, such as visual backward masking (Zhang et al., 2012; Chen et al., 2016; Huang et al., 2020), crowding (Montaser-Kouhsari and Rajimehr, 2005), and continuous flash suppression (Jiang et al., 2006), as well as subthreshold presentation (Zhang and Fang, 2012) and the patient with blindsight (Kentridge et al., 1999a,b, 2004). However, it is unclear whether there is a common neural computation governing these improvements in visual performance with and without awareness.
There has been a long-standing debate about the neural computations underlying visual bottom-up attention. Experiments examining how it modulates visual performance and neuronal activity in visual cortex have found disparate attentional effects on stimulus-evoked neural responses, such as the contrast-response function (CRF) (Reynolds et al., 2000; Martínez-Trujillo and Treue, 2002). Some have reported that attentional selection primarily enhances neural responses to high-contrast stimuli (response gain) (McAdams and Maunsell, 1999; Ling and Carrasco, 2006; Kim et al., 2007; Lee and Maunsell, 2009), whereas others have reported that attentional selection primarily enhances neural responses to medium-contrast stimuli (contrast gain) (Reynolds et al., 2000; Martínez-Trujillo and Treue, 2002; Reynolds and Chelazzi, 2004). Still others have reported that attentional selection either enhances the entire contrast range or produces a combination of both response-gain and contrast-gain changes (Huang and Dobkins, 2005; Buracas and Boynton, 2007; Murray, 2008; Pestilli et al., 2011).
Crucially, these ostensibly conflicting results of the gain changes induced by visual attention can be explained by the normalization model of attention (Reynolds and Heeger, 2009), which proposes that attention-triggered improvements on perception hinge on two critical factors: the stimulus size and the attention field size. Changes in the relative size of these two factors can tip the balance between neuronal excitatory and inhibitory processes, thereby resulting in response-gain changes, contrast-gain changes, or various combinations of the two. Specifically, this model predicts that attention increases contrast gain when the stimulus is small and the attention field is large and increases response gain when the stimulus is large and the attention field is small. Remarkably, this prediction has been supported by previous psychophysical (Herrmann et al., 2010; Schwedhelm et al., 2016; Zhang et al., 2016; Schallmo et al., 2020), electroencephalography (Itthipuripat et al., 2014, 2019), and voxel-based fMRI (Hara et al., 2014) studies. However, little is known regarding whether visual bottom-up attention with and without awareness is governed by this common neural computation and how awareness could modulate its gain changes. Given neurons in frontoparietal areas have very large receptive fields and also play a prominent role in conscious awareness (Dehaene and Changeux, 2011), we speculated that manipulating subjects' awareness could modulate their bottom-up attention fields, thus offering a unique opportunity to regulate its normalization processes.
Here, using a modified Posner paradigm with backward masking, we manipulated the distance between the cue and probe (see Fig. 1) to measure the attention filed with and without awareness (Distribution experiments). Results showed that the attention field was significantly wider with than without awareness, which offers a unique opportunity to change the relative size of the attention field to the stimulus, differentially modulating the gain of attentional selection. Confirmedly, by measuring the gain pattern of CRFs on the spatial cueing effect derived by visible or invisible cues (Normalization experiments), we observed changes in the cueing effect consonant with changes in contrast gain for visible cues and response gain for invisible cues. Additionally, using the classical normalization model of attention (Reynolds and Heeger, 2009), we simulated the attention field with and without awareness successfully. Our results thus support important predictions of the normalization model of visual bottom-up attention and further reveal its dependence on awareness.
Materials and Methods
Subjects
A total of 16 human subjects (8 male, 19-26 years old) were involved in the study. All of them participated in Distribution experiments, 14 of them repeated the Distribution experiments with decreased luminance of visible cues, and the following Normalization experiments. They were naive to the purpose of the study. They were right-handed, reported normal or corrected-to-normal vision, and had no known neurologic or visual disorders. They gave written, informed consent, and our procedures and protocols were approved by the human subjects review committee of School of Psychology at South China Normal University.
Apparatus
Visual stimuli were displayed on an IIYAMA color graphic monitor (model HM204DT; refresh rate: 60 Hz; resolution: 1280 × 1024; size: 22 inches) at a viewing distance of 57 cm. Subjects' head position was stabilized using a chin rest. A white fixation cross was always present at the center of the monitor.
Experimental design and statistical analysis
Distribution experiments
Stimuli
As illustrated in Figure 1, each texture stimulus contained 18 positions (the possible locations of the exogenous cue and probe) settled at an iso-eccentric distance from fixation (8.27° of visual angle): half of them were located in the left visual field, and the other half were located in the right visual field. The center-to-center distance between two neighboring positions was 1.35°. The exogenous cue was a low-luminance ring (8.9 cd/m2; inner diameter: 0.909°; outer diameter: 0.961°), while the probe was a rectangle of 0.104° × 0.831° in visual angle and was oriented at 45° or 135° away from the vertical. Low- and high-contrast masks, which had the same grid as the texture stimulus, rendered the exogenous cue visible or invisible (confirmed by a two-alternative forced choice) to subjects, respectively. Each mask ring contained two pairs of orthogonal circular arcs: one pair was white (19.3 and 79.8 cd/m2 for low- and high-contrast masks, respectively), and the other pair was black (11.3 and 0.01 cd/m2 for low- and high-contrast masks, respectively). The ring in the mask had the same size as the exogenous cue in the texture stimulus (see Fig. 1B,F).
Procedure
The Distribution experiment consisted of Main and Control experiments, which were the same, except for the luminance of the visible cue. The Main experiment used a high-luminance cue and the Control experiment used a low-luminance cue meant to control for that the awareness-dependent attention field could be explained by the difference in cueing effect between the visible and invisible conditions (see Results). Both the Main and Control experiments consisted of three parts: Parts 1-3. In each visual field, the probe position was constant and the exogenous cue position varied in Part 1 (i.e., the varied cue with constant probe; see Fig. 1A,C), whereas Part 2 was a converse situation (i.e., the constant cue with varied probe; see Fig. 1E,G) (the distinction between Parts 1 and 2 was addressed in Discussion). In both Parts 1 and 2, there were five possible distances between the exogenous cue and probe, ranging from D0 (cue and probe at the same location) through D4 (cue and probe four items away from each other). Subjects participated in Parts 1 and 2 on 2 different days, and the order of the two parts was counterbalanced across subjects. Part 3 checked the effectiveness of the awareness manipulation in both Parts 1 and 2, and was always before them.
In both Parts 1 and 2, each trial began with the fixation. A cue frame with (the cue condition) or without (the non-cue condition) exogenous cue was presented for 50 ms, followed by a 100 ms mask (low- and high-contrast in visible and invisible conditions, respectively, confirmed by Part 3) and another 50 ms fixation interval. Then a probe line, orientating at 45° or 135° away from the vertical, was presented for 50 ms. Subjects were asked to press one of two buttons as rapidly and correctly as possible to indicate the orientation of the probe (45° or 135°). The cueing effect for each distance (D0-D4) was quantified as the difference between the reaction time of the probe task performance in the non-cue condition and that in the cue condition (see Fig. 1D,H). One should note that our study cannot design the valid and invalid cue conditions developed by the classical Posner cueing paradigm (Posner et al., 1980) to measure the cueing effect of each distance (D0-D4) since the distance (visual angle) between the exogenous cue and probe was varied not only in the valid cue condition but also in the invalid cue condition.
Differently, Part 1 consisted of 16 blocks of 96 trials, 48 for the left visual field and 48 for the right visual field. In each block and each visual field, an exogenous cue was equiprobably and randomly presented at 1 of the 9 positions (see Fig. 1A) in 40 trials (the cue condition) and was absent in the remaining 8 trials (the non-cue condition). The probe was always presented at the center of 9 positions (see Fig. 1C). Part 2 consisted of 16 blocks of 80 trials, 40 for the left visual field and 40 for the right visual field. In each block and each visual field, an exogenous cue always appeared (the cue condition) or was absent (the non-cue condition) in the center of 9 positions (see Fig. 1E) with equal probability; the probe appeared equiprobably and randomly across the possible 9 positions (see Fig. 1G).
The stimuli and procedure in Part 3 were the same as those in Parts 1 and 2, except that no probe was presented. Part 3 checked the effectiveness of the awareness manipulation in Parts 1 and 2, and was always before them. In Part 3, all subjects underwent a two-alternative forced choice task to determine whether the masked cue was visible or invisible in a criterion-free way. After the presentation of a masked cue frame, subjects were asked to indicate which side (left or right) from the fixation they thought the cue appeared. Their performances were significantly higher or not statistically different from chance for all possible distances (D0-D4), providing an objective confirmation that the cue was indeed visible or invisible to subjects, respectively.
Model fitting and comparison
For each subject and each condition (visible and invisible), we fitted a monotonic model and two nonmonotonic models to the averaged cueing effect. The monotonic model was implemented as the Gaussian function, and the two nonmonotonic models were implemented as the Mexican Hat (i.e., a negative second derivative of a Gaussian function) and polynomial functions (Finke et al., 2008; Fang and Liu, 2019; Fang et al., 2019) as follows:
Normalization experiments
Stimuli
As illustrated in Figure 4C, the exogenous cue of Normalization experiments was the same as those in the Distribution Part 2; that is, the exogenous cue always appeared in the center of 9 positions in left or right hemifield at 8.27° eccentricity. The probe was a pair of gratings (spatial frequency: 1.7 cycles/°; phase: random) that were presented at the exogenous cue's locations in the left and right hemifields. The gratings were presented at five possible contrasts: 0.02, 0.08, 0.15, 0.40, and 0.70. For each subject, the diameter of grating was manipulated as their respective mean FWHM bandwidth for the Gaussian of bottom-up attention with and without awareness in Distribution experiments (see Fig. 4A) as follows:
Procedure
The Normalization experiment consisted of two experiments (visible and invisible). Each trial began with central fixation. The exogenous cue, a low-luminance ring, randomly appeared at the center of 9 positions in left or right hemifield with equal probability, followed by a 100 ms mask (low- and high-contrast for visible and invisible conditions, respectively) and another 50 ms fixation interval. Then, a pair of gratings (with identical contrasts) was presented for 33 ms in the left and right hemifields, one of which was the target. Subjects were asked to press one of two buttons to indicate the orientation of the target grating (leftward or rightward tilted) and received auditory feedback if their response was incorrect. The target grating was indicated by a peripheral 100 ms response cue (0.4° black circular arc) above one of the grating locations, but not at the grating location to avoid masking. A congruent cue was defined as a match between the exogenous cue location and response cue location (half the trials); an incongruent cue was defined as a mismatch (half the trials) (see Fig. 4C). Subjects were explicitly told that the exogenous cue was randomized and uninformative about the target location. The Normalization experiment consisted of two sessions (visible and invisible), with the two sessions occurring on different days; the order of the two sessions was counterbalanced across subjects. Each session consisted of 64 blocks; each block had 80 trials, from randomly interleaving 16 trials from each of the five contrasts. Contrast varied from trial to trial in randomly shuffled order, and stimuli were presented briefly (i.e., 33 ms) to avoid any possible dependence of attentional state on stimulus contrast. The attentional effect for each grating contrast was quantified as the difference between the performance accuracy (d′) in the congruent and incongruent cue conditions.
Psychophysical data analysis
To quantitatively examine the pattern of gain (either contrast or response gain) separately for bottom-up attention with and without awareness, for each subject, performance—i.e., d′ = z (hit rate) – z (false alarm rate)—was assessed across experimental blocks for each contrast and each trial condition (congruent and incongruent). A rightward response to a rightward stimulus tilt was (arbitrarily) considered to be a hit, and a rightward response to a leftward stimulus was considered to be a false alarm. For each subject, the mean d′ CRFs obtained for congruent and incongruent trials were fit with the standard Naka–Rushton equation (Naka and Rushton, 1966) as follows:
Model simulations
The normalization model of attention (Reynolds and Heeger, 2009) computes the response of an arbitrary single neuron to a given set of stimuli as follows:
Conversely, the narrowed attention field led to response gain changes since attentional gain (λ) enhanced the entire stimulus drive (αc), but its impact on the denominator S + σ is much minimal. In this case:
Our results supported these predictions of the normalization model of attention by showing that manipulating subjects' awareness could modulate the field of visual bottom-up attention, which, in turn, affected its normalization processes (see Fig. 5). To further confirm this awareness-dependent normalization framework of visual bottom-up attention, we simulated our empirical data using custom MATLAB scripts based on the code of Reynolds and Heeger (2009) with four free parameters: the gain of attention [A(x,θ)], separately optimized for visible and invisible cue conditions, the normalization constant σ, the orientation tuning of attention field, and a scaling parameter to linearly scale simulated values to d′. Given the simulated attention fields [A(x,θ)] are in arbitrary units, only the relative values are meaningful (Reynolds and Heeger, 2009); in both the visible and invisible conditions, we thus calculated the correlation coefficients between the simulated attention field and experimental attention fields (i.e., the FWHM) across individual subjects (see Fig. 6).
Results
Distribution experiments
The Distribution experiment consisted of Main and Control experiments, which were the same, except for the luminance of the visible cue. The Main experiment used a high-luminance cue and the Control experiment used a low-luminance cue meant to control for that the awareness-dependent attention field could be explained by the difference in cueing effect between the visible and invisible conditions. Both the Main and Control experiments consisted of three parts: Parts 1-3. In each visual field, the probe position was constant (Fig. 1C) and the exogenous cue position varied (Fig. 1A) in Part 1 (varied cue with constant probe), whereas Part 2 was a converse situation (constant cue with varied probe; Fig. 1E,G). In both Parts 1 and 2, there were five possible distances between the exogenous cue and probe, ranging from D0 (cue and probe at the same location) through D4 (cue and probe four items away from each other). Subjects participated in Parts 1 and 2 on 2 different days, and the order of the two experiments was counterbalanced across subjects. Part 3 checked the effectiveness of the awareness manipulation (visible and invisible) in both Parts 1 and 2, and was always before them.
Main experiment in distribution experiments
During Part 3, subjects reported that they were unaware of the exogenous cue and could not detect which visual filed contained it in the invisible condition. Their performances were not statistically different from chance [mean percent correct ± SEM, Part 1 (i.e., varied cue), D0: 47.656 ± 1.651%, D1: 48.242 ± 1.996%, D2: 49.609 ± 1.303%, D3: 45.508 ± 2.096%, D4: 49.609 ± 1.533%, all t(15) < 0.968, p > 0.348, η2 p< 0.500; Part 2 (i.e., constant cue): 49.503 ± 0.411%, t(15) = 0.436, p = 0.669, η2 p = 0.225]; for the visible condition, by contrast, their performance was significantly higher than chance (Part 1, D0: 98.828 ± 0.630%, D1: 98.438 ± 0.699%, D2: 97.266 ± 1.137%, D3: 96.875 ± 1.276%, D4: 96.094 ± 1.496%, all t(15) > 30.812, p < 0.001, η2 p> 15.911; Part 2: 99.503 ± 0.411%, t(15) = 52.557, p < 0.001, η2 p = 27.140). Furthermore, for Part 1, subjects' performances were submitted to a repeated-measures ANOVA with awareness (visible and invisible) and distance (D0-D4) as within-subjects factors. The main effect of distance (F(4,60) = 0.215, p = 0.929, η2 p = 0.014) and the interaction between the two factors (F(4,60) = 0.942, p = 0.446, η2 p = 0.059) were not significant, but the main effect of awareness was significant (F(1,15) = 1873.86, p < 0.001, η2 p = 0.992). These results indicate that our awareness manipulation was effective for both the visible and invisible conditions, and there was no significant difference in subject performance among five distances.
In both Parts 1 (Fig. 1D) and 2 (Fig. 1H), each trial began with the fixation. A cue frame with (the cue condition) or without (the non-cue condition) exogenous cue was presented for 50 ms, followed by a 100 ms mask (low- and high-contrast masks rendered the exogenous cue visible or invisible to subjects, respectively) and another 50 ms fixation interval. Then a probe line, orientating at 45° or 135° away from the vertical, was presented for 50 ms. Subjects were asked to press one of two buttons as rapidly and correctly as possible to indicate the orientation of the probe (45° or 135°). For each condition, a rightward response to a 45° line was (arbitrarily) considered to be a hit, a rightward response to a 135° line was considered to be a false alarm, and a leftward response to a 45° line was considered to be a miss. There was no significant difference in the false alarm rate, miss rate, or removal rate (i.e., correct reaction times shorter than 200 ms and beyond 3 SDs from the mean reaction time in each condition were removed) across conditions (all p > 0.05; Table 1). The cueing effect for each distance (D0-D4) was quantified as the difference between the reaction time of the probe task performance in the non-cue condition and that in the cue condition.
Figure 2A shows the cueing effect of each condition for both Parts 1 and 2; most of these cueing effects were significantly >0, indicating that the bottom-up attention of the subject was attracted to the exogenous cue location, allowing them to perform more proficiently in the cue condition than the non-cue condition of the probe task. In both Parts 1 and 2, a repeated-measures ANOVA with awareness (visible and invisible) and distances (D0-D4) as within-subjects factors showed that the interaction between these two factors (Part 1: F(4,60) = 9.921, p < 0.001, η2 p = 0.398; Part 2: F(4,60) = 3.36, p = 0.015, η2 p = 0.183), the main effect of awareness (Part 1: F(1,15) = 29.27, p < 0.001, η2 p = 0.661; Part 2: F(1,15) = 72.26, p < 0.001, η2 p = 0.828), and the main effect of distances (Part 1: F(4,60) = 80.08, p < 0.001, η2 p = 0.842; Part 2: F(4,60) = 86.30, p < 0.001, η2 p = 0.852) were all significant. Subsequent post hoc paired t tests revealed that the cueing effect decreased gradually with the distance in both Part 1 (the visible condition, D0 vs D1: t(15) = 6.56, p < 0.001, η2 p = 3.388, D1 vs D2: t(15) = 3.68, p = 0.023, η2 p = 1.900, D2 vs D3: t(15) = 4.36, p = 0.006, η2 p = 2.251, D3 vs D4: t(15) = 3.18, p = 0.063, η2 p = 1.642; the invisible condition, D0 vs D1: t(15) = 6.44, p < 0.001, η2 p = 3.326, D1 vs D2: t(15) = 4.13, p = 0.009, η2 p = 2.133, D2 vs D3: t(15) = 1.60, p = 1.000, η2 p = 0.826, D3 vs D4: t(15) = 0.60, p = 1.000, η2 p = 0.310) and Part 2 (the visible condition, D0 vs D1: t(15) = 6.72, p < 0.001, η2 p = 3.470, D1 vs D2: t(15) = 5.70, p < 0.001, η2 p = 2.943, D2 vs D3: t(15) = 1.28, p = 1, η2 p = 0.661, D3 vs D4: t(15) = 2.08, p = 0.554, η2 p = 1.074; the invisible condition, D0 vs D1: t(15) = 9.12, p < 0.001, η2 p = 4.710, D1 vs D2: t(15) = 2.27, p = 0.001, η2 p = 1.172, D2 vs D3: t(15) = 0.50, p = 1.000, η2 p = 0.258, D3 vs D4: t(15) = 0.40, p = 1.000, η2 p = 2.207). These results indicated that the attentional effect induced by both visible and invisible exogenous cues was a monotonic gradient profile with a center maximum falling off gradually in the surround.
Subsequently, to further assess the shape of this attentional effect, we fitted a monotonic model and two nonmonotonic models to the average cueing effect across distances (D0-D4) in both visible and invisible conditions. The monotonic model was implemented as the Gaussian function, and the two nonmonotonic models were implemented as the Mexican Hat (i.e., a negative second derivative of a Gaussian function) and Polynomial functions (Finke et al., 2008; Fang and Liu, 2019; Fang et al., 2019). To compare these three models to our data, we first computed the AIC (Akaike, 1973) and BIC (Schwarz, 1978) with the assumption of a normal error distribution. Then, we calculated the LR and BF of the monotonic model (Gaussian) over nonmonotonic models (Mexican Hat and Polynomial) based on AIC (Burnham and Anderson, 2002) and BIC (Wagenmakers, 2007) approximation, respectively. Results showed that, in both Parts 1 and 2, the LR/BF (Table 2, top) strongly favored the Gaussian model over both the Mexican Hat and Polynomial models (Fig. 2A). Notably, we also conducted similar model comparisons for each subject's data and found that the Gaussian model was favored over both the Mexican Hat and Polynomial models in 11 and 10 for Part 1, in 9 and 9 for Part 2, of 16 subjects, during the visible and invisible conditions, respectively (Fig. 3, left). In addition, we pooled the data from Parts 1 and 2 together and further provided the same qualitative conclusion. The LR/BF (Table 2, top) strongly favored the Gaussian model over both the Mexican Hat and Polynomial models (Fig. 2A). The model comparison based on fitting individual data also demonstrated that the Gaussian model was favored over both the Mexican Hat and Polynomial models in 12 and 10 of 16 subjects during the visible and invisible conditions, respectively (Fig. 3, left). These results further constituted strong evidence for the monotonic gradient profile of visual bottom-up attention with and without awareness.
Our results indicated that the spatial focus of visual bottom-up attention with and without awareness was best explained by the monotonic (Gaussian) rather than the nonmonotonic models (Mexican Hat and Polynomial). To quantitatively examine the attention field of bottom-up attentional modulation, we fitted the cueing effects from D0-D4 with a Gaussian function and used the FWHM bandwidth of the Gaussian to quantify their attention fields. Results showed that the fitted FWHM bandwidth was significantly larger in the visible than the invisible condition for both Part 1 (t(15) = 3.015, p = 0.009, η2 p = 1.557; Fig. 2B, top) and Part 2 (t(15) = 4.863, p < 0.001, η2 p = 2.511; Fig. 2B, middle), as well as for the pooled data from two parts (t(15) = 4.745, p < 0.001, η2 p = 2.450; Fig. 2B, bottom), indicating a wider attention field of bottom-up attention with than without awareness. Notably, this awareness-dependent attention field here could be explained by the difference in cueing effect between the visible and invisible conditions. To examine this issue, we calculated the correlation coefficients between our fitted FWHM bandwidths and cueing effects across individual subjects. If a wider attention field with than without awareness is derived by a greater cueing effect in the visible than the invisible condition, then we would observe a significant correlation between these two measures across individual subjects. However, for Parts 1 and 2, as well as for the pooled data from the two, compared with the invisible condition, the increased FWHM bandwidth was not significantly correlated with the increased peak cueing effect (i.e., the cueing effect of D0, Part 1: r = −0.294, p = 0.269, η2 p = 0.086; Part 2: r = 0.428, p = 0.098, η2 p = 0.183; Parts 1 and 2: r = 0.0484, p = 0.859, η2 p = 0.002) or the mean of cueing effects across distances (Part 1: r = −0.156, p = 0.564, η2 p = 0.024; Part 2: r = 0.343, p = 0.194, η2 p = 0.118; Parts 1 and 2: r = 0.0619, p = 0.820, η2 p = 0.004) in the visible condition (Fig. 2C), which goes against the cueing effect explanation. Together, our findings indicate a gradient profile of visual bottom-up attention with and without awareness, and show a wider attention field of visual bottom-up attention with than without awareness.
Control experiment in distribution experiments
To directly exclude the cueing effect explanation (i.e., the awareness-dependent attention field could be explained by the difference in cueing effect between the visible and invisible conditions), we examined the attention field of bottom-up attention during visible and invisible conditions with no significant difference in the cueing effect between the two conditions. We manipulated the cueing effect of visible condition by decreasing the luminance of its cue (which was still visible to subjects). Fourteen of our 16 subjects repeated Distribution experiments using these low-luminance cues and the repeated-measures ANOVA with awareness (visible and invisible) and distances (D0-D4) as within-subjects factors indicated that our manipulation was effective by showing that, in both Parts 1 and 2, the main effect of awareness (Part 1: F(1,13) = 3.283, p = 0.093, η2 p = 0.202; Part 2: F(1,13) = 1.472, p = 0.247, η2 p = 0.102) was not significant (Fig. 2D). Remarkably, our Control experiments provided the same qualitative conclusion as Main experiments by indicating a gradient profile of visual bottom-up attention with and without awareness and showing a wider attention field of visual bottom-up attention with than without awareness. First, results of Part 3 in Control experiment confirm that our awareness manipulation was effective for both the visible and invisible conditions, and there was no significant difference in subject performance among five distances. Subjects' performances were not statistically different from chance in the invisible condition [Part 1 (i.e., varied cue), D0: 48.661 ± 1.121%, D1: 50.893 ± 1.652%, D2: 47.991 ± 1.927%, D3: 49.330 ± 2.050%, D4: 48.214 ± 1.725%, all t(13) < 1.194, p > 0.254, η2 p< 0.662; Part 2 (i.e., constant cue): 50.536 ± 0.536%, t(13) = 1.000, p = 0.336, η2 p = 0.555], but were significantly higher than chance in the visible condition (Part 1, D0: 85.268 ± 3.315%, D1: 83.929 ± 3.747%, D2: 77.232 ± 2.749%, D3: 79.911 ± 3.774%, D4: 78.571 ± 2.336%, all t(13) > 7.926, p < 0.001, η2 p > 4.397; Part 2: 73.036 ± 1.086%, t(13) = 21.207, p < 0.001, η2 p = 11.764). Similarly, for Part 1, subjects' performances were submitted to a repeated-measures ANOVA with awareness (visible and invisible) and distance (D0-D4) as within-subjects factors. The main effect of distance (F(4,52) = 1.892, p = 0.126, η2 p = 0.127) and the interaction between the two factors (F(4,52) = 0.923, p = 0.458, η2 p = 0.066) were not significant, but the main effect of awareness was significant (F(1,13) = 171.254, p < 0.001, η2 p = 0.929). Second, the model comparisons from Part 1, Part 2, and Parts 1 and 2 provided convincing evidence that the Gaussian model was strongly favored over both the Mexican Hat and Polynomial models with the LR/BF (Table 2, bottom), based on both the group (Fig. 2D) and individual (Fig. 3, right) data. Third, more importantly, we confirmed that the fitted FWHM bandwidth was significantly larger in the visible than the invisible condition for both Part 1 (t(13) = 3.732, p = 0.003, η2 p = 2.070; Fig. 2E, top) and Part 2 (t(13) = 2.561, p = 0.024, η2 p = 1.421; Fig. 2E, middle) as well as for the pooled data from the two (t(13) = 3.752, p = 0.002, η2 p = 2.081; Fig. 2E, bottom). Finally, the increased FWHM bandwidths in the visible condition relative to the invisible condition were not significantly predicted by the increased peak cueing effect (Part 1: r = 0.227, p = 0.434, η2 p = 0.052; Part 2: r = 0.094, p = 0.749, η2 p = 0.009; Parts 1 and 2: r = 0.151, p = 0.607, η2 p = 0.023) or the mean of cueing effects across distances (Part 1: r = 0.334, p = 0.243, η2 p = 0.112; Part 2: r = 0.140, p = 0.634, η2 p = 0.020; Parts 1 and 2: r = 0.235, p = 0.419, η2 p = 0.055) (Fig. 2F).
Normalization experiments
Our Distribution experiments demonstrated an awareness-dependent attention field of visual bottom-up attention, which offers a unique opportunity to change the size of the attention field relative to the stimulus, differentially modulating the gain of bottom-up attentional selection. Thus, for each subject, the diameter of grating (Fig. 4A) was manipulated as their respective mean FWHM bandwidth of the Gaussian with and without awareness, that is, the diameter of grating = (FWHMV + FWHMI)/2, where FWHMV and FWHMI are the fitted FWHM bandwidth of the Gaussian model for the visible and invisible conditions in Distribution experiments, respectively. Under this configuration, the attentional field was larger and smaller than the stimulus size for the visible and invisible cues, yielding a pattern that qualitatively resembled contrast gain or response gain, respectively (Fig. 4B). To examine these predictions, we used a modified version of the Posner paradigm to measure the cueing effect induced by the visible or invisible cue, as shown in Figure 4C. In both conditions, an exogenous cue, a low-luminance ring, randomly appeared at the center of 9 positions in left or right hemifield with equal probability, followed by a 100 ms mask (low- and high-contrast for visible and invisible conditions, respectively) and another 50 ms fixation interval. Then, a pair of gratings was presented for 33 ms in the left and right hemifields, and subjects were asked to press one of two buttons to indicate the orientation of one of two gratings; each was presented at five different contrasts (0.02, 0.08, 0.15, 0.40, and 0.70; the contrasts of both gratings were identical on any given trial and covaried across trials in random order). A response cue at gratings offset indicated the target grating, yielding congruent cue (the exogenous cue matched the response cue, half the trials) and incongruent cue (mismatched, half the trials) conditions (Fig. 4C). Comparing performance accuracy (d′) for congruent and incongruent trials revealed the spatial cueing effect for each target contrast.
The mean d′ plotted as psychometric functions of stimulus contrast and awareness (visible and invisible) is shown in Figure 5A. The visible condition yielded a pattern that qualitatively resembled contrast gain, and the invisible condition yielded a pattern that qualitatively resembled response gain. The measured psychometric function for awareness (visible and invisible) and trial conditions (congruent and incongruent) was fit with the standard Naka–Rushton equation (Naka and Rushton, 1966). The two parameters c50 (the contrast yielding half-maximum performance) and d′max (asymptotic performance at high-contrast levels) determined contrast gain and response gain, respectively. The exponent n (slope) was fixed at 2 in the current analysis (Reynolds and Heeger, 2009; Herrmann et al., 2010; Carandini and Heeger, 2012; Zhang et al., 2016). The d′max for awareness (visible and invisible) and trial conditions (congruent and incongruent) is shown in Figure 5 and was submitted to a repeated-measures ANOVA with awareness and trial condition as within-subjects factors. The main effect of awareness (F(1,12) = 8.915, p = 0.011, η2 p = 0.426), the main effect of the trial condition (F(1,12) = 70.366, p < 0.001, η2 p = 0.854), and the interaction between these two factors (F(1,12) = 71.311, p < 0.001, η2 p = 0.856) were all significant. Post hoc paired t tests showed that d′max of congruent trials was higher than that of incongruent trials for the invisible condition (t(12) = 12.166, p < 0.001, η2 p = 7.024; Fig. 5C, left), but not for the visible condition (t(12) = 1.784, p = 1.000, η2 p = 1.030; Fig. 5B, left); d′max for the invisible condition was higher than that for the visible condition in the congruent trials (t(12) = 5.163, p < 0.001, η2 p = 2.981), but not in the incongruent trials (t(12) = 1.098, p = 0.294, η2 p = 0.634). Similarly, for the c50, the main effect of awareness (F(1,12) = 5.468, p = 0.037, η2 p = 0.313), the main effect of trial condition (F(1,12) = 45.342, p < 0.001, η2 p = 0.791), and the interaction between these two factors (F(1,12) = 52.415, p < 0.001, η2 p = 0.814) were all significant. Post hoc paired t tests showed that c50 of congruent trials was lower than that of incongruent trials for the visible condition (t(12) = −9.303, p < 0.001, η2 p = −5.371, Fig. 5D, left), but not for the invisible condition (t(12) = −0.577, p = 0.575, η2 p = −0.333, Fig. 5E, left); c50 for the visible condition was lower than that for the invisible condition in the congruent trials (t(12) = −5.074, p < 0.001, η2 p = −2.929), but not in the incongruent trials (t(12) = 0.023, p = 0.982, η2 p = 0.013). These results thus suggest that gain modulation of bottom-up attentional selection depends on awareness.
To evaluate further the role of awareness in the gain modulation of visual bottom-up attention, we calculated the correlation coefficients between the relative size of the attention field to the stimulus [i.e., (FWHMV – FWHMI)/2, where FWHMV and FWHMI are the fitted FWHM bandwidth of the Gaussian model for the visible and invisible conditions, respectively] and psychophysical measures (d′max and c50) across individual subjects. The relative size of the attention field to the stimulus in the visible condition significantly correlated with the c50 difference between congruent and incongruent trials (r = −0.602, p = 0.029, η2 p = 0.362; Fig. 5D, right), but not with the d′max difference between congruent and incongruent trials (r = −0.110, p = 0.721, η2 p = 0.012; Fig. 5B, right). Conversely, the relative size of the attention field to the stimulus in the invisible condition significantly correlated with the d′max difference between congruent and incongruent trials (r = 0.591, p = 0.033, η2 p = 0.349; Fig. 5C, right), but not with the c50 difference between congruent and incongruent trials (r = −0.011, p = 0.973, η2 p = 0.0001; Fig. 5E, right). These results thus demonstrate a close relationship between awareness and gain modulation of visual bottom-up attentional selection (response gain and contrast gain changes in psychophysical performance).
In addition, to further confirm this awareness-dependent normalization framework of visual bottom-up attention, we simulated our empirical data with the normalization model of attention (Fig. 6A) using custom MATLAB scripts based on the code of Reynolds and Heeger (2009) with four free parameters: the gain of attention [A(x,θ)], separately optimized for visible and invisible conditions, the normalization constant σ, the orientation tuning of attention field, and a scaling parameter to linearly scale simulated values to performance (d′). Given the simulated attention fields [A(x,θ)] are in arbitrary units, only the relative values are meaningful (Reynolds and Heeger, 2009); in both the visible and invisible conditions, we thus calculated the correlation coefficients between the simulated and experimental attention fields (i.e., the FWHM) across individual subjects. In both conditions, the simulated attention fields (marginally) significantly correlated with the experimental attention fields (the visible conditions: r = 0.792, p = 0.001, η2 p = 0.627; the invisible conditions: r = 0.505, p = 0.079, η2 p = 0.255; Fig. 6B), further confirming that manipulating subjects' awareness could modulate the field of visual bottom-up attention, which, in turn, affected its normalization processes. Notably, given a small sample size in our study, the significant correlation evident here could be driven by the single value or several special values. Further work is thus worthwhile to address whether our conclusion can be replicated with a large sample size.
Discussion
The present results provide support for important predictions of the normalization model of visual bottom-up attention and further reveal its dependence on awareness. Specifically, we indicated that visual bottom-up attention displayed a monotonic gradient profile, both with and without awareness, whose attention field, however, was significantly wider with than without awareness, thereby regulating its normalization processes. More importantly, these awareness-dependent attention fields can be simulated by using the classical normalization model of attention (Reynolds and Heeger, 2009), further supporting an awareness-dependent normalization framework of visual bottom-up attention.
Our findings can be viewed as identifying an awareness-dependent attention field of visual bottom-up attention. It might be argued that this result could be derived from the visible relative to invisible condition having some degree of endogenous attention. Particularly, in Part 1, subjects were aware that in the visible compared with the invisible condition the exogenous cue appeared randomly across the nine possible positions (Fig. 1A). Thus, subjects could have directed endogenous attention to all positions, increasing the attentional set (Couperus and Lydic, 2019), and yielding a wider scope of attentional modulation. Critically, it is important to note that, in our study, the task required subjects to discriminate the orientation of the probe; the exogenous cue was never task-relevant. Thus, subjects did not need to direct endogenous attention to these task-irrelevant cues. More importantly, this endogenous attention explanation could not account for the same qualitative conclusion in Part 2 since the same exogenous cue was always presented at the center of nine locations (Fig. 1E) during the visible and invisible conditions (i.e., there was the same attentional set between them). Additionally, our conclusion is based on a report-based paradigm in which subjects overtly push a button to report their percept. Several studies have argued that such report-based paradigms could be modulated by factors that are not directly related to the attention field, such as higher-level strategies, response history, experience, learning, response biases, and personality (Yeshurun, 2019). Using a no-report paradigm, such as recording subjects' pupillary light responses, Tkacz-Domb and Yeshurun (2018) revealed that the attention field was twofold larger than that estimated using the traditional report-based paradigm. In our study, subjects performed exactly the same task between visible and invisible conditions; thus, the awareness-dependent attention field evident here cannot be explained by this discrepancy between the report-based and no-report paradigms. However, its underlying neural basis could depend on whether subjects overtly report their percept. On the one hand, several theories of conscious awareness, including the neuronal global workspace (Dehaene and Changeux, 2011), information integration (Koch et al., 2016; Tononi et al., 2016), and higher-order (Lau and Rosenthal, 2011) theories, propose that the neural activity in frontoparietal cortex is essential for conscious awareness. Similar to our study, evidence from those theories typically used the report-based paradigm. Thus, although speculative, it is plausible that the wider attention field with than without awareness evident here may result from the increased activity in frontoparietal cortical areas. On the other hand, several studies have argued that such report-based paradigms do not dissociate the brain regions required for pure conscious experience from those involved in conscious access and reportability (Tsuchiya et al., 2015; Koch et al., 2016). Those studies, by contrast, found that posterior rather than frontoparietal cortical areas were activated when using a no-report paradigm, such as recording eye movements and pupil dilation (Aru et al., 2012; Frässle et al., 2014). In other words, the awareness-dependent attention field is more likely to be mediated by posterior cortical areas when using the no-report paradigm. Consequently, further work is needed using both report-based and no-report paradigms to examine the difference in attention field of visual bottom-up attention with and without awareness, as well as their distinct neural mechanisms.
The most parsimonious account of our results is that visual bottom-up attention interacts with the normalization processes depending on awareness. Importantly, this result cannot be explained by the strength of cueing effect, poststimulus cue, or an involvement of endogenous attention. First, both the visible and invisible cues were the same with those in Distribution experiments, and no significant difference in cueing effect was found between the two (Fig. 2D). Second, although previous studies have suggested that the poststimulus cue (e.g., the response cue) can influence not only subjects' nonperceptual decision (Eckstein et al., 2013) but also the perception of stimuli presented before it (Sergent et al., 2013), the response cue in our study was totally randomized and uninformative about the target grating in both visible and invisible conditions; we thus believe that our psychophysical results cannot be explained by the response cue. Finally, subjects knew before each trial that the discrimination task was to be performed on one of two gratings and could have therefore directed endogenous attention to both; thus, it is not known exactly how exogenous attention and this endogenous attention combine (Herrmann et al., 2010). However, subjects in our study performed exactly the same task between visible and invisible conditions; this potential combination thus could not account for the observed awareness-dependent normalization processes of visual bottom-up attention.
Our data can be interpreted by a hypothesis that behavioral performance is limited by the neuronal activity with an additive, independent, and identically distributed noise, and the decision-making process with a maximum-likelihood decision rule (Jazayeri and Movshon, 2006; Pestilli et al., 2009). Performance accuracy d′, used in both previous (Herrmann et al., 2010; Zhang et al., 2016) and our studies here, is proportional to the signal-to-noise ratio of the underlying neuronal responses. Thus, it can parallel reflect any change in neuronal CRFs in our study. Indeed, we found that a change in the cueing effect consonant with a change in contrast gain of CRF for bottom-up attention with awareness and a change in response gain of CRF for bottom-up attention without awareness (Fig. 5A). These awareness-dependent gain modulations of visual bottom-up attentional selection support and extend the normalization model of attention (Reynolds and Heeger, 2009). This model proposes that, in the absence of attention (e.g., in the incongruent cue condition), two factors determine the firing rate of a visually responsive neuron. One is the stimulus drive (excitatory component) determined by the contrast of the stimulus placed in the receptive field of a neuron. The other is the suppressive drive (inhibitory component) determined by the summed activity of other neighboring neurons, which serves to normalize the overall spike rate of the given neuron via mutual inhibition (Heeger, 1992). Attention (e.g., in the congruent cue condition) modulates the pattern of neural activity by altering the balance between these excitatory and inhibitory components, depending on the relative sizes of the attention field to the stimulus size, and thereby exhibiting response gain changes, contrast gain changes, and various combinations of the two. In our study, given the attention field of visual bottom-up attention was significantly wider with than without awareness (Fig. 2), for each subject, the size of the target stimuli in the spatial cueing task was manipulated as their respective mean attention fields with and without awareness (Fig. 4A). Thus, relative to the stimulus size, the broadened attention field by visible exogenous cues led to contrast gain changes because attentional gain was applied equally to the stimulus and suppressive drives. Conversely, the narrowed attention field by invisible exogenous cues led to response gain changes because attentional gain enhanced the entire stimulus drive, but only enhanced the center of the suppressive drive. Indeed, using the classical normalization model of attention, we successfully simulated these broadened and narrowed attention fields of visible and invisible cues, respectively (Fig. 6), further supporting an awareness-dependent normalization framework of visual bottom-up attention.
Notably, evidence from neurophysiological and brain imaging studies indicate controversies concerning the brain regions involved in visual bottom-up attention, such as subcortical structures (Shipp, 2004; Fecteau and Munoz, 2006), visual (Mazer and Gallant, 2003; Zhang et al., 2012), and frontoparietal (Corbetta and Shulman, 2002; Bisley and Goldberg, 2010; Squire et al., 2013), cortical areas. An important factor of this controversy is the awareness to the exogenous cue, which determines whether the realized neural substrate reflects the pure bottom-up attention or not (Zhang et al., 2012; Chen et al., 2016; Huang et al., 2020). Intriguingly, our results are consistent with this idea by showing an awareness-dependent normalization framework of visual bottom-up attention. Although normalization as a neural computation likely occurs throughout the whole brain (Carandini and Heeger, 2012), the observed neural correlates of its interaction with visual bottom-up attention could also depend on awareness, and further studies will shed light on this issue using neurophysiological or brain imaging techniques.
In conclusion, manipulating subjects' awareness can modulate the attention field of visual bottom-up attentional modulation, which, in turn, affects its normalization processes. Our study provides, to the best of our knowledge, the first experimental evidence supporting an awareness-dependent normalization framework of visual bottom-up attention, thereby furthering our understanding of the neural computations underlying visual attention, the relationship between attention and awareness, and how they interactively shape our experience of the world.
Footnotes
This work was supported by National Outstanding Youth Science Fund Project of National Natural Science Foundation of China Project 32022032; National Natural Science Foundation of China General Program 31871135; and Key Realm R&D Program of Guangzhou 202007030005. We thank David Heeger and Ruyuan Zhang for valuable comments.
The authors declare no competing financial interests.
- Correspondence should be addressed to Xilin Zhang at xlzhang{at}m.scnu.edu.cn