Abstract
Many studies of bottom-up visual attention have focused on identifying which features of a visual stimulus render it salient—i.e., make it “pop out” from its background—and on characterizing the extent to which salience predicts eye movements under certain task conditions. However, few studies have examined the relationship between salience and other cognitive functions, such as memory. We examined the impact of visual salience in an object–place working memory task, in which participants memorized the position of 3–5 distinct objects (icons) on a two-dimensional map. We found that their ability to recall an object's spatial location was positively correlated with the object's salience, as quantified using a previously published computational model (Itti et al., 1998). Moreover, the strength of this relationship increased with increasing task difficulty. The correlation between salience and error could not be explained by a biasing of overt attention in favor of more salient icons during memorization, since eye-tracking data revealed no relationship between an icon's salience and fixation time. Our findings show that the influence of bottom-up attention extends beyond oculomotor behavior to include the encoding of information into memory.
Introduction
Visual attention is governed by both top-down and bottom-up processes (Connor et al., 2004; Knudsen, 2007). Whereas top-down attention is slow, goal-directed, and under executive control (Miller and Cohen, 2001; Miller and D'Esposito, 2005), bottom-up visual attention is fast, automatic, and—as the term “bottom-up” implies—driven by the visual properties of the environment (Egeth and Yantis, 1997). Bottom-up processes are responsible for involuntarily guiding one's gaze toward salient visual stimuli, that is, stimuli that pop out from the background—for example, a red apple against a thatch of green leaves. (Note: Although some authors have used the term salience to describe the enhancement of a stimulus by either top-down or bottom-up mechanisms, here we use the term exclusively to denote a bottom-up effect.)
Many studies have examined the influence of visual salience on explicitly visual behaviors, such as saccade paths during free-viewing (Peters et al., 2005) and visual search tasks (Treisman and Gelade, 1980; Itti and Koch, 2000; Rao et al., 2002; Najemnik and Geisler, 2005). Other studies of bottom-up visual attention have focused on its relative influence versus top-down attention (Parkhurst et al., 2002; Elazary and Itti, 2008), its locus within the brain (Gottlieb et al., 1998; Constantinidis and Steinmetz, 2005), and the conditions under which its impacts are manifest (Henderson et al., 2007; Peters and Itti, 2008). However, little is known about the influence of salience on other cognitive functions, including memory. Here, we examine the impact of visual salience on performance in an object–place working memory task.
We asked a group of 12 participants to recall the position of 3–5 unique target icons on a map after a short delay. We found that target salience, as quantified using a computational model of bottom-up attention (Itti et al., 1998; Walther and Koch, 2006), was positively correlated with participants' performance. In particular, positional error decreased as target salience increased. Moreover, the strength of the relationship increased with increasing task difficulty. Finally, we found that the effect of salience on memory did not depend on eye movements; eye-tracking data revealed that participants fixated on each icon for an equal amount of time during memorization, regardless of that icon's salience.
Our findings show that bottom-up visual attention plays a larger role in cognition than has been described previously. Our results also suggest that the representation of visual salience within the brain extends beyond previously identified areas (Itti and Koch, 2001) and that salience information is likely communicated to higher-order centers involved in spatial working memory.
Materials and Methods
Twelve participants were seated 60 cm from a 1680 × 1050 pixel liquid crystal display monitor and asked to memorize the spatial location of 3–5 distinct icons superimposed on a map. We used a chin rest to limit participants' head movement. Participants ranged in age from 23 to 50 years old [31.1 ± 8.0 years (SD)]. All participants reported 20/20 vision (corrected) and all were naive as to the purpose of the experiment.
Two classes of map-based images were used—overhead satellite imagery, taken from Google Earth and elevation maps from the National Geospatial-Intelligence Agency. Icons were selected from the Department of Defense's common warfighting symbology (MIL-STD-2525B). Ten maps (five from each class, interspersed) were presented twice each in three sets, for a total of 20 trials per set. The number of icons was constant throughout each set. The position of an icon was pseudorandomly determined, but each participant saw the same configuration of icons. The angular size of an icon was ∼1° at viewing distance. Icons varied in shape, color, and symbology (Fig. 1). In the 3- and 4-target cases, all icons were unique; that is, no icon appeared more than once on the same map. [In the 5-target case, it was necessary to present an icon twice, because of limitations in the size of the icon vocabulary. However, the icons were separated on average by 13.4 ± 3.9 cm (SD), >4 times the average positional error, to prevent confusion.] The order of set presentation was counterbalanced across participants. Each set lasted ∼10 min, and participants rested for 3 min between sets.
After an image was presented, participants were asked to click on each icon “as quickly as possible.” Click locations were recorded and used to confirm that a participant had located all the icons. After the last icon was found, participants were given 4 s to memorize each icon's spatial location. The screen was then masked for 4 s, after which participants were asked to drag each icon back to its original position. During the recall phase, the map reappeared in 50% of trials (map-on condition); otherwise, the map window was left blank (map-off condition). No time limit was imposed, but participants were encouraged to make their “best guess” and not to dwell on the position of any single icon (Fig. 1).
We used an eye-tracking camera (SMI) to sample and record participants' eye position at 60 Hz. The camera uses infrared light to track the eye and compensates for limited head movement by tracking the corneal reflex. The camera was calibrated before each set using 13 calibration locations. After each set, we re-recorded participants' eye position (averaged over a 0.5 s fixation window) at each calibration location to assess calibration drift. On average, the drift was 0.42 ± 0.33° (SD) per calibration location over the course of each training set.
Target salience was quantified using a previously published computational model of bottom-up visual attention (Itti et al., 1998) (Fig. 2). The model contains feature channels—each with a similar computational architecture—that respond to color, intensity, and orientation contrast, respectively. First, the model filters the image at multiple spatial scales to form a dyadic Gaussian pyramid. Features are then extracted via center-surround filtering, approximated by subtracting layers of the pyramid. The features are then combined across scale and passed through a nonlinear normalization operator to suppress weaker activity, while enhancing near-global maxima. Finally, the output of each channel is combined and renormalized to generate a feature-independent salience map, which has been shown to predict eye position in free-viewing tasks (Ouerhani et al., 2004; Peters et al., 2005; Peters and Itti, 2008). We also validated the model using map-based images similar to those used in this experiment (unpublished data). Although other computational models of visual salience have also been proposed—including embellished versions of the salience model used here—we chose to use this salience model because of its biological plausibility, its success at predicting human eye movements, its ease of implementation, and its prominence within the literature. For a head-to-head comparison of the salience model with other models of bottom-up visual attention, see Peters and Itti (2008).
When computing target salience, we averaged all pixels within a target's boundary. Icon salience did not differ (ANOVA, p = 0.59) between the 3-, 4-, and 5-target conditions (mean saliency ± SD = 1.25 ± 0.05, 1.21 ± 0.05, and 1.18 ± 0.04, respectively). Positional error was defined as the straight-line distance between a target's veridical position and where a participant placed it during the recall phase. The time in which a participant attended to a target was defined as the number of eye-tracking datum within 1° of the target's edge, multiplied by the sampling rate of the eye tracker.
We used t tests to determine statistical significance. When using a t test to ascertain changes in slope, we subtracted the slope within participants (e.g., 3-target slope from 4-target slope) and compared the difference across participants. p < 0.05 was considered significant.
Results
To test if the visual salience of an object affected performance in an object–place memory test, we asked 12 participants to memorize the spatial location of 3–5 unique icons and then recall their position after a 4 s delay. We began our analysis by averaging the data within a set. As expected, the positional error increased with the number of target icons (ANOVA, p < 0.001). On average, it equaled 2.12 ± 0.30 cm, 2.58 ± 0.24 cm, and 3.35 ± 0.47 cm in the 3-, 4-, and 5-target conditions, respectively (where the ± interval equals the 95% confidence interval of the mean).
Within a target condition, we found that average error was negatively correlated with target salience (Fig. 3A). The slope (Fig. 3B) of a linear fit was significantly different from 0 in the 4- (t test, p < 0.001) and 5-target (t test, p = 0.011) conditions (slope = −0.489 ± 0.188 cm and −0.859 ± 0.551 cm, respectively). It was not significantly different from 0 in the 3-target condition (t test, p = 0.33; slope = −0.190 ± 0.370 cm). Positional error was also correlated with a target's salience rank, where rank was calculated within an image and then averaged across images (Fig. 3C). The slope (Fig. 3D) of a linear fit was significantly different from 0 in the 4- (t test, p = 0.046) and 5-target (t test, p = 0.005) condition (slope = −0.086 ± 0.079 cm and −0.191 ± 0.105 cm, respectively). Once again, it was not significantly different from 0 (t test, p = 0.32) in the 3-target condition (slope = 0.078 ± 0.144 cm). In addition, we found that the (absolute) slope (Fig. 3A) increased with increasing task difficulty, as quantified by the average positional error within a set (Fig. 4). This trend was well fit by a line (r2 = 0.92); the slope of the fit was significantly different from 0 (generalized linear model, p < 0.001). In general, then, as performance decreased (as the error, on average, increased), the overall relationship between error and salience became more pronounced.
In the recall phase, the map could reappear (the map-on condition) or not (the map-off condition). We asked if a participant's ability to recall the position of an icon was affected by the retrieval condition (on or off). Within a target condition, we found that the error, on average, was larger in the map-off condition than the map-on condition; it was significantly larger in the 4-target condition (error = 2.30 ± 0.20 cm and 2.86 ± 0.22 cm in the map-on and map-off condition, respectively; t test, p = 0.002). The difference was nearly significant in the 3-target (error = 1.93 ± 0.34 cm and 2.31 ± 0.21, respectively; t test, p = 0.065) and 5-target conditions (error = 3.20 ± 0.25 cm and 3.49 ± 0.27 cm, respectively; t test, p = 0.10). In either condition, the relationship between error and salience was the same as in Figure 3 (combined map-on and map-off). The slope of a linear fit was indistinguishable regardless of the retrieval condition (p = 0.70, 0.30, and 0.26 in the 3-, 4-, and 5-target condition, respectively). In addition, no single slope was significantly different from the combined map-on, map-off data previously reported in Figure 3 (t test, p > 0.25 for all).
Salience might also influence memory indirectly by biasing eye movements (a proxy for overt attention) in favor of more salient targets (Itti and Koch, 2001); that is, fixating on more salient targets for a longer duration could improve performance. To investigate if this was the case, we examined participants' eye movements during the study phase of the task. We found that participants attended to all targets for an equal amount of time, regardless of their salience (Fig. 5A). The slope (Fig. 5B) of a linear fit (slope = 0.041 ± 0.060 s, −0.030 ± 0.060 s, and 0.025 ± 0.051 s for the 3-, 4-, and 5-target conditions, respectively) was not significantly different from 0 in any target condition (t test, p = 0.21, 0.35, and 0.36 for the 3-, 4-, and 5-target conditions, respectively). We also asked whether either click order or fixation order (during the study phase) was correlated with target salience; it was not for any target condition (click order, p = 0.26, 0.70, and 0.60; fixation order, p = 0.25, 0.39, and 0.13 for the 3-, 4-, and 5-target conditions, respectively). In addition, we found no correlation between click order and fixation order (r2 = 0.005). Finally, we found that the order of target fixation during the study phase did not predict memory performance for any target condition (p = 0.25, 0.53, and 0.33 for the 3-, 4-, and 5-target conditions, respectively). Based on these results, we conclude that bottom-up visual attention, as quantified by visual salience, significantly impacted participants' ability to recall the spatial location of objects in a display, regardless of participants' saccade patterns.
Discussion
The tendency to orient attention toward visually salient stimuli is conserved across species and likely confers an evolutionary advantage by enabling an organism to rapidly detect and react to behaviorally relevant objects and events within its environment (e.g., an approaching predator). Much research to date has sought to characterize the influence of salience on visual behaviors, in particular saccade patterns during free-viewing (Peters et al., 2005) and structured visual search tasks (Itti and Koch, 2000; Rao et al., 2002; Najemnik and Geisler, 2005). Still, other studies have aimed to identify the visual features that contribute to salience and to build computational models—like the one used in the present study—that can produce a quantitative “salience map” for an image or video (Rosenholtz, 1999; Itti and Koch, 2001; Le Meur et al., 2006; Mancas et al., 2007). Yet beyond these studies, few investigators have examined the relationship between salience and other cognitive functions, including memory (Stirk and Underwood, 2007; Berg and Itti, 2008). This is surprising, given the otherwise well recognized impact of attention on memory (VanRullen and Koch, 2003; Droll et al., 2005; Chun and Turk-Browne, 2007; Wolfe et al., 2007; Cabeza et al., 2008) and memory on attention (Carmi and Itti, 2006; Soto et al., 2006; Ciaramelli et al., 2009).
We examined the relationship between bottom-up attention, as quantified by salience, and visual–spatial working memory. Our data reveal that salience, which reflects the low-level visual properties of a stimulus and its environment, affects human performance in an object–place working memory test. It is important to note that, in contrast with several related studies that examined the impact of attention on subjects' ability to recall object features or identity (Droll et al., 2005; Berg and Itti, 2008), the present study required subjects to recall an object's location in conjunction with its identity.
We found that participants' ability to recall the location of targets on a two-dimensional map increased with increasing target salience. Moreover, this effect became more pronounced as the difficulty of the task increased, suggesting that the brain's memory systems may use salience to prioritize information for encoding when resources are taxed. We also examined whether the relationship between salience and memory could be explained by overt attentional effects, that is, whether participants spent more time fixating the more salient targets. This was not the case. Instead, we observed that participants distributed their gaze equally among targets during memorization, which suggests that salience influences memory through covert means. Although we cannot completely rule out the possibility that salience also influences memory by biasing overt attention (overt effects might simply require longer than 4 s—the study phase of this task—to manifest), this would itself be noteworthy, since the impact of salience on eye movements during goal-directed tasks (like ours) is still a matter of debate in the literature (Henderson et al., 2007; Foulsham and Underwood, 2008; Peters and Itti, 2008; Mannan et al., 2009). In summary, we conclude that bottom-up visual attention can impact performance in a memory task, that it can do so covertly, and that it can do so even in a top-down task like the one used here, in which the task goal and targets are specified in advance.
Recently, another study that examined the intersection of bottom-up attention and memory came to a different conclusion. Berg and Itti (2008) asked participants to examine a shopping-related scene for 2 s and then asked if a target item was contained in the scene. They found that fixation times, but not the salience of objects, were predictive of performance. The authors concluded that, to the extent that salience does contribute to memory, it does so mostly through influencing overt attention. Although this result seems to contradict our own findings, there are several key differences between the two studies that might explain the discrepancy. Foremost, the task of Berg and Itti (2008) is one of object recognition, whereas our task requires the recall of spatial information in conjunction with object identity. Several attempts to identify a salience map in the brain have focused on the dorsal visual pathway, which is differentially involved in the processing of visual–spatial information (Mishkin et al., 1983). For instance, salience encoding has been found in area 7a (Constantinidis and Steinmetz, 2005) and the lateral interparietal area (LIP) (Gottlieb et al., 1998) in posterior parietal cortex, as well as in other cortical and subcortical areas involved in the planning and generation of eye movements (Robinson and Petersen, 1992; Fecteau et al., 2004; Shipp, 2004; Thompson and Bichot, 2005). Yet, the degree to which salience is represented in the ventral visual pathway, which governs object recognition, is less clear (but see Mazer and Gallant, 2003). Thus, an intriguing possibility is that salience affects the encoding of spatial memories more so than memories related to object identity. Additional experiments separating object and spatial components of memory will be required to test this hypothesis.
In general, the manner and extent to which salience impacts memory will likely depend on the nature of the task and the specific memory subsystems recruited. Which memory subsystems are involved in the current task, and what are the pathways by which salience-related information might influence these circuits? The paradigm used in the present study is similar to that of Olson et al. (2006), who showed that the medial temporal lobe, in particular the hippocampus, is critically involved in working memory tasks that require the conjunctive binding of object–place relationships (see also Finke et al., 2008; Hannula and Ranganath, 2008). Anatomical evidence reveals that the hippocampus receives projections directly from posterior parietal area 7a, which itself receives inputs from LIP, thereby providing a plausible pathway by which salience might affect the formation of hippocampal memories (Tsanov and Manahan-Vaughan, 2008). Additionally, prefrontal areas are well known to play a critical role in both spatial and object working memory (Fuster and Alexander, 1971; Levy and Goldman-Rakic, 2000). Anatomical data from nonhuman primates shows that dorsolateral prefrontal cortex, which is closely implicated in spatial working memory, receives a strong projection from posterior parietal cortex (Romanski, 2004). Moreover, some parietal areas have been implicated in working memory function (Constantinidis and Procyk, 2004). Thus, the interaction of salience and working memory might begin locally within the posterior parietal cortex.
Further research will be needed to illuminate the differential impact of salience on distinct memory networks. Although the present study dealt with working memory, other recent work has looked at the interaction of attention with explicit versus implicit memory systems (Chun and Turk-Browne, 2007). Several researchers have proposed neurobiologically motivated functional taxonomies for memory that could serve as promising frameworks for investigating this issue further (Squire, 1986; Eichenbaum et al., 2007; Bird and Burgess, 2008). Another potential line of future inquiry concerns the contribution of individual feature channels (e.g., color, intensity, orientation) to salience-related memory effects (Koene and Zhaoping, 2007). Such studies, augmented with neuroimaging methods, may reveal more fully the processes and substrates by which bottom-up attention and memory interact in the brain.
Footnotes
-
↵*M.S.F. and B.S.M. are joint first authors.
-
We thank Julia High and Craig Haimson for suggestions on the experimental design and Jeff Colombe for comments on this manuscript.
- Correspondence should be addressed to Dr. Brandon S. Minnery, The MITRE Corporation, MS H205, 7515 Colshire Drive, McLean, VA 22102. minnery{at}gmail.com